I have a text file that looks similar to;
section header 1:
some words can be anything
more words could be anything at all
etc etc lalasome other header:
as before could be anything
hey isnt this fun
I am trying to contruct a grammar with pyparser that would result in the following list structure when asking for the parsed results as a list; (IE; the following should be printed when iterating through the parsed.asList() elements)
['section header 1:',[['some words can be anything'],['more words could be anything at all'],['etc etc lala']]]
['some other header:',[['as before could be anything'],['hey isnt this fun']]]
The header names are all known beforehand, and individual headers may or may not appear. If they do appear, thre is always at least one line of content.
The problem I am having, is that I am having trouble gettnig the parser to recognise where 'section header 1:' ands, and 'some other header:' begins. I end up with a parsed.asList() looking like;
['section header 1:',[[''some words can be anything'],['more words could be anything at all'],['etc etc lala'],['some other header'],[''as before could be anything'],['hey isnt this fun']]]
(IE: section header 1: gets seen correctly, but everythng following it gets added to section header 1, including further header lines etc..)
Ive tried various things, played with leaveWhitespace() and LineEnd() in various ways but I can't figure it out.
The base parser I am hacking about with is (contrived example - in reality this is a class definition etc..).
header_1_line=Literal('section header 1:')
text_line=Group(OneOrMore(Word(printables)))
header_1_block=Group(header_1_line+Group(OneOrMore(text_line)))
header_2_line=Literal('some other header:')
header_2_block=Group(header_2_line+Group(OneOrMore(text_line)))
overall_structure=ZeroOrMore(header_1_block|header_2_block)
and is being called with
parsed=overall_structure.parseFile()
Cheers, Matt.