.net - How to make an ANTLR rule "greedy"? -


i started using antlr generate simple parser interpolated strings. input string examples follow (one per line):

hello {user.name}!
welcome on planet {getplanetname(" stupid string param :-} ")}
plain string without interpolated expression
string escaped {{ brackets }}

the grammer decide whether string (plainstring) or expression (expressionstring) follows:

grammar t;  patternstring:                  (plainstring | expressionstring)+                                 ;  plainstring:                    (cbo_escapesequence | cbc_escapesequence | plainstringliteral)+                                 ;  expressionstring:               cbo expression cbc | curlybrackets_empty                                 ;  expression:                     expressionsegment+                                 ;  expressionsegment:              ~('"' | '\'' | '{' | '(' | '[' | '}' | ')' | ']' | cbo_escapesequence | cbc_escapesequence)+                                 | '(' expressionsegment+ ')' | '(' ws ')' | '()'                                 | '[' expressionsegment+ ']' | '[' ws ']' | '[]'                                 | '{' expressionsegment+ '}' | curlybrackets_empty                                 | stringliteral                                 | charliteral                                 ;  stringliteral:                  '"' (~('"') | '\\"')+ '"'                                 | '""'                                 ;  charliteral:                    '\'' (~('\'') | '\\\'')+ '\''                                 ;  fragment ws:                    (' ' | '\r' | '\n' | '\t')+;  plainstringliteral:             ~('{' | '}'); curlybrackets_empty:            (cbo ws cbc | cbo cbc); cbo:                            '{'; cbc:                            '}';  fragment cbo_escapesequence:    '{{'; fragment cbc_escapesequence:    '}}'; 

this working except strings following:

{{{new[]{1, 2, 3, 4}}}}

which gives me following ast

patternstring                                 => '{{{new[]{1, 2, 3, 4}}}}'     expressionstring                          => '{{{new[]{1, 2, 3, 4}}}}'         expression                            => '{{new[]{1, 2, 3, 4}}}'             expressionsegment                 => '{{new[]{1, 2, 3, 4}}}'                 expressionsegment             => '{new[]{1, 2, 3, 4}}'                     expressionsegment         => 'new[]'                     expressionsegment         => '{1, 2, 3, 4}'                         expressionsegment     => '1, 2, 3, 4' 

whereas expect (and want) following ast:

patternstring                                 => '{{{new[]{1, 2, 3, 4}}}}'     plainstring                               => '{{'     expressionstring                          => '{new[]{1, 2, 3, 4}}'         expression                            => 'new[]{1, 2, 3, 4}'             expressionsegment                 => 'new[]'             expressionsegment                 => '{1, 2, 3, 4}'                 expressionsegment             => '1, 2, 3, 4'     plainstring                               => '}}' 

meaning, plainstring should more greedy , take escaped brackets possible. how can fix in above grammar?

i think issues due explicit definition of rule open , closing curly braces, referencing them in of parser rules string literal. modifying expression segment rule reference lexer rules, issue seems resolved. please try out grammar , see if issue fixed

expressionstring:               cbo expression cbc | curlybrackets_empty                                 ;  expression:                     expressionsegment+                                 ;  expressionsegment:                                   l_paren expressionsegment+ r_paren                                 | l_bracket expressionsegment+ r_bracket                                 | cbo expressionsegment+ cbc                                 | l_paren ws r_paren                                 | l_bracket ws r_bracket                                 | l_paren r_paren                                 | l_bracket r_bracket                                 | curlybrackets_empty                                 | stringliteral                                 | charliteral                                 | ~(double_quote | single_quote | cbc | cbo | l_paren | l_bracket | r_paren | r_bracket)+                                 ;  stringliteral:                  '"' (~('"') | '\\"')+ '"'                                 | '""'                                 ;  charliteral:                    '\'' (~('\'') | '\\\'')+ '\''                                 ;  ws:                    (' ' | '\r' | '\n' | '\t')+;  plainstringliteral:             ~('{' | '}'); curlybrackets_empty:            (cbo ws cbc | cbo cbc); cbo:                            '{'; cbc:                            '}'; l_paren: '('; r_paren: ')'; l_bracket: '['; r_bracket: ']'; single_quote: '\''; double_quote: '"'; 

as can see, parse tree seems reflect looking for

enter image description here


Comments