java - ANTLR4 Accepting additional tokens as valid? -
i'm building small rule language test , used antlr. i'm using antlr v4 , have following grammar split follows:
lexer.g4
lexer grammar lexer; /*------------------------------------------------------------------ * lexer rules - generic keywords *------------------------------------------------------------------*/ not : 'not' ; null : 'null' ; , : 'and' | '&' ; /*------------------------------------------------------------------ * lexer rules - pattern matching *------------------------------------------------------------------*/ delim : [\|\\/:,&@+><^] ; ws : [ \t\r\n]+ -> skip ; value : squote text squote ; fragment squote : '\'' ; fragment text : ( 'a'..'z' | 'a'..'z' | '0'..'9' | '-' )+ ;
attribute.g4
grammar attribute; /*------------------------------------------------------------------ * semantic predicate * * attributes capitalised words may have spaces. they're * loaded database , and set in glue code * can cross checked here. if grammar passed in sees * attribute pass long attribute in * database, otherwise grammar fail parse. *------------------------------------------------------------------*/ attr : a=attr {attributes.contains($a.text)}? ; attr : ([a-z][a-za-z0-9/]+([ ][a-z][a-za-z0-9/]+)?) ;
replaceinwith.g4
grammar replaceinwith; /*------------------------------------------------------------------ * replace in parser rules *------------------------------------------------------------------*/ replace_in_with : rep in {row.put($in.value , $in.value.replace($rep.value, $with.value));} | repatt {row.put($repatt.value, $with.value);} ; rep returns[string value] : replace v=value {$value = trimquotes($v.text);} ; repatt returns[string value] : replace a=attr {$value = $a.text;} ; in returns[string value] : in a=attr {$value = $a.text;} ; returns[string value] : v=value {$value = trimquotes($v.text);} ; /*------------------------------------------------------------------ * lexer rules - keywords *------------------------------------------------------------------*/ replace : 'rep' | 'replace' ; in : 'in' ; : 'with' ;
parser.g4
grammar parser; /*------------------------------------------------------------------ * imported rules *------------------------------------------------------------------*/ import //essential imports attribute, gluecode, lexer, //actual rules replaceinwith, /*------------------------------------------------------------------ * parser rules * must add each top level rule here callable *------------------------------------------------------------------*/ eval : replace_in_with ;
gluecode.g4
java supply static calling functionality grammar , set attributes database.
parsererrorlistener.java
public class parsererrorlistener extends parserbaselistener { /** * after every rule check see if exception thrown, if exit runtime exception indicate * parser problem.<p> */ @override public void exiteveryrule(@notnull parserrulecontext ctx) { super.exiteveryrule(ctx); if (ctx.exception != null) { throw new parserruntimeexception(string.format("error evaluating expression(s) '%s'", ctx.exception)); } //if } //exiteveryrule } //class
when supply following grammar passes expected:
"replace 'acme' in name 'acme'", "rep 'acme' in name 'acme'", "replace 'acme' in name 'acme'", "rep 'acme' in name 'acme'", "replace 'e' in name 'i'", "rep 'e' in name 'i'", "replace '-' in number ' '", "rep '-' in number ' '", "replace '555' in number '00555'", "rep '555' in number '00555'"
where name , number setup attributes semantic predicate.
however when pass in following statement grammar still passes i'm not sure why matches:
"replace 'acme' in name 'acme'", "replaceany 'acme' in name 'acme'",
again name passed in attribute matched semantic predicate, part of grammar works in tests. part that's failing 'any' part. grammar matches replace , gets next token thinks 'acme' ignoring 'any' part in both examples above. expecting here grammar fail , in listener on exit rule have added check should throw runtime exception, caught gluecode indicate failure.
any ideas on how can grammar throw error when occurs?
first , foremost, lexer rules global in antlr. every token in input assigned one, , one, token type. if separate lexer rules multiple files, becomes maintenance nightmare determine cases tokens ambiguous. general rule is:
avoid using
import
lexer grammars contain rules not markedfragment
modifier.the
attr
token assigned inputs matching looksattr
, regardless of whether or not predicate inattr
rule succeeds. prevent inputs matchattr
rule being considered token type. should move semantic predicateattr
ruleattr
rule prevent lexer ever creatingattr
tokens inputs not in set of predefined attributes.the
parserrulecontext.exception
field not guaranteed set in event of syntax error. way determine syntax error did not occur callparser.getnumberofsyntaxerrors()
after parsing, or add ownantlrerrorlistener
.your last lexer rule should resemble following. otherwise, input sequences not match lexer rule silently dropped. rule passes inputs on parser handling/reporting.
errorchar : . ;
for complicated grammars, avoid using combined grammars. instead, create
lexer grammar
,parser grammar
grammars, parser grammars usetokenvocab
option import tokens. combined grammars allow implicitly declare lexer rules writing string literals in parser rules, reduces maintainability of large grammars.replaceinwith.g4 contains many rules embedded actions. these actions should moved separate listener run after parsing complete, ,
returns
clauses these rules should removed. improves both portability , reusability of grammar. example of how can seen in these commits part of larger pull request showing conversion of application using antlr 3 antlr 4.
Comments
Post a Comment