java - ANTLR4 Accepting additional tokens as valid? -


i'm building small rule language test , used antlr. i'm using antlr v4 , have following grammar split follows:

lexer.g4

lexer grammar lexer;  /*------------------------------------------------------------------  * lexer rules - generic keywords  *------------------------------------------------------------------*/ not     : 'not'     ;  null     : 'null'     ;  ,     : 'and'     | '&'     ;  /*------------------------------------------------------------------  * lexer rules - pattern matching  *------------------------------------------------------------------*/ delim     : [\|\\/:,&@+><^]     ;  ws      : [ \t\r\n]+ -> skip      ;  value      : squote text squote     ;  fragment squote     : '\''      ;  fragment text      : ( 'a'..'z'        | 'a'..'z'       | '0'..'9'       | '-'       )+ ; 

attribute.g4

grammar attribute;  /*------------------------------------------------------------------  * semantic predicate  *  * attributes capitalised words may have spaces.  they're   * loaded database , and set in glue code  * can cross checked here.  if grammar passed in sees  * attribute pass long attribute in   * database, otherwise grammar fail parse.  *------------------------------------------------------------------*/   attr     : a=attr {attributes.contains($a.text)}?     ;  attr     : ([a-z][a-za-z0-9/]+([ ][a-z][a-za-z0-9/]+)?)     ; 

replaceinwith.g4

grammar replaceinwith;  /*------------------------------------------------------------------  * replace in parser rules  *------------------------------------------------------------------*/ replace_in_with     : rep in {row.put($in.value    , $in.value.replace($rep.value, $with.value));}     | repatt {row.put($repatt.value, $with.value);}     ;  rep returns[string value]     : replace v=value {$value = trimquotes($v.text);}     ;  repatt returns[string value]     : replace a=attr  {$value = $a.text;}     ;  in returns[string value]     : in a=attr {$value = $a.text;}     ;  returns[string value]     : v=value {$value = trimquotes($v.text);}     ;  /*------------------------------------------------------------------  * lexer rules - keywords  *------------------------------------------------------------------*/ replace     : 'rep'     | 'replace'     ;  in     : 'in'     ;      : 'with'     ; 

parser.g4

grammar parser;  /*------------------------------------------------------------------  * imported rules  *------------------------------------------------------------------*/  import //essential imports     attribute,     gluecode,     lexer,      //actual rules     replaceinwith,  /*------------------------------------------------------------------  * parser rules  * must add each top level rule here callable  *------------------------------------------------------------------*/ eval     : replace_in_with     ; 

gluecode.g4

java supply static calling functionality grammar , set attributes database. 

parsererrorlistener.java

public class parsererrorlistener extends parserbaselistener  {     /**      * after every rule check see if exception thrown, if exit runtime exception indicate       * parser problem.<p>      */     @override      public void exiteveryrule(@notnull parserrulecontext ctx)      {          super.exiteveryrule(ctx);          if (ctx.exception != null)         {             throw new parserruntimeexception(string.format("error evaluating expression(s) '%s'", ctx.exception));         } //if     } //exiteveryrule } //class 

when supply following grammar passes expected:

"replace 'acme' in name 'acme'", "rep 'acme' in name 'acme'", "replace 'acme' in name 'acme'", "rep 'acme' in name 'acme'", "replace 'e' in name 'i'", "rep 'e' in name 'i'",  "replace '-' in number ' '", "rep '-' in number ' '", "replace '555' in number '00555'", "rep '555' in number '00555'" 

where name , number setup attributes semantic predicate.

however when pass in following statement grammar still passes i'm not sure why matches:

"replace 'acme' in name 'acme'", "replaceany 'acme' in name 'acme'", 

again name passed in attribute matched semantic predicate, part of grammar works in tests. part that's failing 'any' part. grammar matches replace , gets next token thinks 'acme' ignoring 'any' part in both examples above. expecting here grammar fail , in listener on exit rule have added check should throw runtime exception, caught gluecode indicate failure.

any ideas on how can grammar throw error when occurs?

  1. first , foremost, lexer rules global in antlr. every token in input assigned one, , one, token type. if separate lexer rules multiple files, becomes maintenance nightmare determine cases tokens ambiguous. general rule is:

    avoid using import lexer grammars contain rules not marked fragment modifier.

  2. the attr token assigned inputs matching looks attr, regardless of whether or not predicate in attr rule succeeds. prevent inputs match attr rule being considered token type. should move semantic predicate attr rule attr rule prevent lexer ever creating attr tokens inputs not in set of predefined attributes.

  3. the parserrulecontext.exception field not guaranteed set in event of syntax error. way determine syntax error did not occur call parser.getnumberofsyntaxerrors() after parsing, or add own antlrerrorlistener.

  4. your last lexer rule should resemble following. otherwise, input sequences not match lexer rule silently dropped. rule passes inputs on parser handling/reporting.

    errorchar : . ; 
  5. for complicated grammars, avoid using combined grammars. instead, create lexer grammar , parser grammar grammars, parser grammars use tokenvocab option import tokens. combined grammars allow implicitly declare lexer rules writing string literals in parser rules, reduces maintainability of large grammars.

  6. replaceinwith.g4 contains many rules embedded actions. these actions should moved separate listener run after parsing complete, , returns clauses these rules should removed. improves both portability , reusability of grammar. example of how can seen in these commits part of larger pull request showing conversion of application using antlr 3 antlr 4.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -