0

As a pet project, I'm trying to make a groff parser with Jison ( a JavaScript clone of Bison ), but I'm struggling my head trying to figure out if groff's grammar is LALR(1).

Does anyone have an insight about this?.

Thanks in advance.

Update 1

In response to Brian concerns, here are more details about my problem:

  • Groff is written in C++ and does not use Bison, I'm deriving the grammar myself.

  • I've uploaded all my progress here

roperzh
  • 890
  • 7
  • 12
  • I think you need to give us more information. Do you have a source for the grammar of groff or are you deriving it yourself? If you are creating, what do you have so far? Can you explain the problems you are having with it in detail and then, perhaps, we could help. No one is going to work this out for scratch for you...... – Brian Tompsett - 汤莱恩 Nov 06 '15 at 20:21
  • Hey @BrianTompsett-汤莱恩, absolutely I'm not asking for a solution, I'm trying to find a guide to know if makes sense to parse groff with Bison. – roperzh Nov 06 '15 at 20:35

2 Answers2

2

Most of the work parsing troff is lexical, although you could make use of a parser to evaluate arithmetic expressions. The "grammar" is otherwise just a question of identifying control lines and splitting them into arguments (again, essentially lexical).

If you intend to implement the controls which modify control and escape characters (.cc, .c2, .ec and .eo), then you will find precompiled regular expressions to be awkward, although the workaround for control characters is not awful.

I think I'd be inclined to restrict use of jison to pieces of the language like arithmetic expressions.

Of course, jison would come in handy for preprocessors like eqn, in case that is in your plans.

rici
  • 234,347
  • 28
  • 237
  • 341
1

As @nci said, most of the parsing work is just lexical; other than the expressions (and possibly macros/diversions) the request/escape language itself is probably LL(1); jison/bison is almost certainly up to the task, and indeed, probably overkill.

Based on your code so far it looks like you're implementing a parser for manpages specifically, rather than for general troff input. If so, that simplifies what you need to handle; manpages generally don't use conditional logic or macros (although the man macros themselves may).

evil otto
  • 10,348
  • 25
  • 38