Pyparsing: How to parse SQL Hints

Question

I am trying to parse the EBNF below (commentated in the code) and I'm struggling to resolve the STRING part of the optional comments.. (written as extra comment in my test string)

 from pyparsing import *

# SQL HINT EBNF 
'''
{ /*+ hint [ string ]
      [ hint [ string ] ]... */
| --+ hint [ string ]
      [ hint [ string ]...
}
'''

test_string = "/*+ALL_ROWS extra comment FIRST_ROWS CACHE*/"

LCOMMENT = Literal("/*+")
RCOMMENT = Literal("*/")

grammar = Forward()

hint_all_rows = Literal("ALL_ROWS")
hint_first_rows = Literal("FIRST_ROWS")
hint_cache = Literal("CACHE")

comment_in_hint = Word(printables)

all_hints = (hint_all_rows | hint_first_rows | hint_cache)+ ZeroOrMore(comment_in_hint)

grammar <<  all_hints  + ZeroOrMore(grammar)

all_grammar = LCOMMENT + grammar + RCOMMENT

p = all_grammar.parseString(test_string)

print p

"extra" and "comment" don't appear anywhere in your parser, nor are there any catch-all type elements like `Word(printables)` which would accept any group of non-whitespace characters. — PaulMcG, May 11 '16 at 19:58
Paul, that is a fast response. And now I feel a bit stupid not to have included my workings out! (as you can see I am not an experienced SO user..).. I shall edit my OP with , as you have already surmised, a catch all. However, this causes the trailing two hints, FIRST_ROWS and CACHE to be caught.. — Spencer Attridge, May 12 '16 at 07:57
And I tried to enter the test in Edit but its not showing (I think!) I had added a catch all (comment_in_hint) to the code, but its greedy and eats the closing comment bracket (RCOMMENT).. — Spencer Attridge, May 12 '16 at 08:05
To get things working, change to `comment_in_hint = Word(printables, excludeChars='*')`. At some point, this will fail when you have a comment that contains an embedded '*' character, but this should help you make some forward progress for now. — PaulMcG, May 12 '16 at 08:38
I would not merge `comment_in_hint` as part of `all_hints`. Instead, have `all_hints` just be the list of defined hints, but change `grammar` to `grammar << (all_hints | comment_in_hint) + ZeroOrMore(grammar)`. You can also write this more simply as `grammar = ZeroOrMore(all_hints | comment_in_hint)` - no need for the recursive definition. — PaulMcG, May 12 '16 at 08:44
Lastly, change your hint expressions to use the Keyword class instead of Literal - this will protect you from accidentally matching a non-hint comment that happens to start with a hint, say "CACHE_DISABLED" for instance. — PaulMcG, May 12 '16 at 08:46
Ah. The amount of fixes makes me feel awful. Thanks for the effort and time to adjust it. First time with your pyparsing (and python too) and I'm having fun to be fair, but obviously trying to run before I can walk. — Spencer Attridge, May 12 '16 at 09:40

Spencer Attridge · Accepted Answer · 2016-05-12T12:31:37.287

This is the code that now runs thanks to Paul McGuire's help in the comments on the OP. I did get rid of the forward function when initially setting the answer here. But checking the code by attaching result names to the different elements, I noticed that my first answer here was classifying all but the first hint as comments. So therefore I kept the Forward but utilised some other suggestions of Pauls.

from pyparsing import *

# SQL HINT EBNF
'''
{ /*+ hint [ string ]
      [ hint [ string ] ]... */
| --+ hint [ string ]
      [ hint [ string ]...
}
'''

LCOMMENT = Literal("/*+")
RCOMMENT = Literal("*/")

grammar = Forward()

hint_all_rows = Keyword("ALL_ROWS")
hint_first_rows = Keyword("FIRST_ROWS")
hint_cache = Keyword("CACHE")

comment_in_hint = Word(printables, excludeChars='*')

grammar = Forward()

all_hints = (hint_all_rows | hint_first_rows | hint_cache).setResultsName("Hints", listAllMatches=True) + Optional(comment_in_hint)("Comments*")

grammar << all_hints + ZeroOrMore(grammar)

all_grammar = LCOMMENT + grammar + RCOMMENT

p = all_grammar.parseString("/*+ ALL_ROWS aaaaaaa FIRST_ROWS bbbbb */")

print p["Hints"]

print p["Comments"]

Pyparsing: How to parse SQL Hints

1 Answers1