Splitting text into lines with pyparsing

Question

I am writing my first program which uses pyparsing.

I want to parse a file where each line ended by "\n" is a token.

Please explain how to do it.

In fact, I need to parse .lyx files. One example of a .lyx file: https://github.com/nicowilliams/lyx/blob/master/lib/examples/Braille.lyx

@omri_saadon: At the first stage I want just to split the file into lines. (A token is a string.) Afterward (trick!) I am going to parse each token with **another** parser. That is, I will first split into tokens, then parse each token. — porton, Jul 22 '15 at 13:19
The main parser, uses that ("another") parser to determine interesting sequences of tokens. The "another" parser is used only to check properties of tokens, the rest work is done by the main parser — porton, Jul 22 '15 at 13:24
If your steps really are so independent, then I would suggest just using splitlines() to break up the initial string by line, then pass each to parser, something like `for line in input_string.splitlines(): result = line_parser.parseString(line)` You can even use the sum builtin like this to merge all the results into a single structure (note the use of Group around your line parser to maintain each line's data separate): `all_results = sum(Group(line_parser).parseString(line) for line in input_string.splitlines())` — PaulMcG, Jul 23 '15 at 12:28

score 2 · Accepted Answer · answered Jul 22 '15 at 14:43

2

It seem that the following solves the task:

import sys
import pyparsing # parsley

all_files = sys.argv[1:]

if not all_files:
    print "Usage: DuplicateRefs.py FILE.lyx ...\n"
    sys.exit(1)

def mylambda(tok):
    print tok

parser = pyparsing.ZeroOrMore(pyparsing.CharsNotIn("\n").setParseAction(mylambda) + pyparsing.White("\n"))

for file in all_files:
    parser.parseFile(file)

answered Jul 22 '15 at 14:43

porton

5,214
11
47
95

1

Definitely does what the OP asked for - in place of `White("\n")`, try `LineEnd()`, I've never liked when parsers explicitly parse on whitespace and avoid it if I can. – PaulMcG Jul 23 '15 at 12:31

Splitting text into lines with pyparsing

1 Answers1

Linked