2

I am writing my first program which uses pyparsing.

I want to parse a file where each line ended by "\n" is a token.

Please explain how to do it.

In fact, I need to parse .lyx files. One example of a .lyx file: https://github.com/nicowilliams/lyx/blob/master/lib/examples/Braille.lyx

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
porton
  • 5,214
  • 11
  • 47
  • 95
  • What is the requested output? – omri_saadon Jul 22 '15 at 13:13
  • @omri_saadon: At the first stage I want just to split the file into lines. (A token is a string.) Afterward (trick!) I am going to parse each token with **another** parser. That is, I will first split into tokens, then parse each token. – porton Jul 22 '15 at 13:19
  • The main parser, uses that ("another") parser to determine interesting sequences of tokens. The "another" parser is used only to check properties of tokens, the rest work is done by the main parser – porton Jul 22 '15 at 13:24
  • If your steps really are so independent, then I would suggest just using splitlines() to break up the initial string by line, then pass each to parser, something like `for line in input_string.splitlines(): result = line_parser.parseString(line)` You can even use the sum builtin like this to merge all the results into a single structure (note the use of Group around your line parser to maintain each line's data separate): `all_results = sum(Group(line_parser).parseString(line) for line in input_string.splitlines())` – PaulMcG Jul 23 '15 at 12:28

1 Answers1

2

It seem that the following solves the task:

import sys
import pyparsing # parsley

all_files = sys.argv[1:]

if not all_files:
    print "Usage: DuplicateRefs.py FILE.lyx ...\n"
    sys.exit(1)

def mylambda(tok):
    print tok

parser = pyparsing.ZeroOrMore(pyparsing.CharsNotIn("\n").setParseAction(mylambda) + pyparsing.White("\n"))

for file in all_files:
    parser.parseFile(file)
porton
  • 5,214
  • 11
  • 47
  • 95
  • 1
    Definitely does what the OP asked for - in place of `White("\n")`, try `LineEnd()`, I've never liked when parsers explicitly parse on whitespace and avoid it if I can. – PaulMcG Jul 23 '15 at 12:31