5

I'm creating a syntax that supports significant whitespace (most like the "Z" lisp variant than Python or yaml, but same idea)

I came across this article on how to do significant whitespace parsing in a pegasus a PEG parser for C#

But I've been less than successful at converting that to parsley, looks like the #STATE# variable in Pegasus follows backtracking in some way.

This is the closest I've gotten to a simple parser, If I use the version of indent with look ahead it can't parse children, and if I use the version without, it can't parse siblings.

If this is a limitation of parsley and I need to use PyPEG or Parsimonious or something, I'm open to that, but it seems like if the internal indent variable could follow the PEGs internal backtracking this would all work.

import parsley

def indent(s):
    s['i'] += 2
    print('indent i=%d' % s['i'])


def deindent(s):
    s['i'] -= 2
    print('deindent i=%d' % s['i'])


grammar = parsley.makeGrammar(r'''
id = <letterOrDigit+>
eol = '\n' | end
nots = anything:x ?(x != ' ')

node =  I:i id:name eol !(fn_print(_state['i'], name)) -> i, name

#I = !(' ' * _state['i'])
I = (' '*):spaces ?(len(spaces) == _state['i'])
#indent = ~~(!(' ' * (_state['i'] + 2)) nots) -> fn_indent(_state)
#deindent = ~~(!(' ' * (_state['i'] - 2)) nots) -> fn_deindent(_state)

indent = -> fn_indent(_state)
deindent = -> fn_deindent(_state)

child_list = indent (ntree+):children deindent -> children

ntree = node:parent (child_list?):children -> parent, children
nodes = ntree+

''', {
    '_state': {'i': 0},
    'fn_indent': indent,
    'fn_deindent': deindent,
    'fn_print': print,
})

test_string = '\n'.join((
    'brother',
    '  brochild1',
    #'    gchild1',
    #'  brochild2',
    #'    grandchild',
    'sister',
    #'  sischild',
    #'brother2',
))

nodes = grammar(test_string).nodes()
Mark Harviston
  • 660
  • 4
  • 18

0 Answers0