Tokenizing and parsing pairs of strings with Jison (or Bison)

Question

I'm trying to build a parser with Jison (a node.js implementation of Bison) to parse a file that looks like this:

---
Redirect Test Patterns
---

one.html /two/
one/two.html /three/four/
one /two
one/two/ /three
one/two/ /three/four
one/two /three/four/five/
one/two.html http://three.four.com/
one/two/index.html http://three.example.com/four/
one http://two.example.com/three
one/two.pdf https://example.com
one/two?query=string /three/four/
go.example.com https://example.com

The goal

This is a file that stores redirection paths/URLs. There are other scripts that refer to this file when they need to know how to redirect a user. The goal is to develop a parser that I can run every time someone attempts to save the file. That way, I can make sure it's always formatted properly.

Basically, everything inside the --- block is to be ignored, as well as any empty lines. Each of the remaining lines represent a "redirection record".

For each "redirection record", it must have the following structure:

INPUT_URL_OR_PATH <space> OUTPUT_URL_OR_PATH

In other words, there is to be a single space separating two strings.

What I have done so far

I am very new to grammars/parsing, so please bear with me.

The language grammar I have sketched out looks like this:

file -> lines EOF

lines -> record
lines -> lines record

record -> INPATH SPACE OUTPATH

The terminal symbols include: EOF, INPATH, SPACE, OUTPATH.

Unfortunately, I am not even at the point where I can implement that yet because I am having trouble developing my lexer.

This is what my jison file looks like:

/* description: Parses a list of redirects */

/* lexical grammar */
%lex

%x comment

%%

"---"                 this.begin("comment")
<comment>"---"        this.popState()
<comment>[\n]         /* skip new lines */
<comment>.            /* skip all characters */

[ \t\n]               /* do nothing */
(\w+)                 return 'WORD'
<<EOF>>               return 'EOF'
.                     /* do nothing */

/lex

/* operator associations and precedence */

/* n/a */

%start file

%% /* language grammar */

file
  : lines EOF
      { console.log($1); return $1; }
  | EOF
      { const msg = 'The target file is empty';
        console.log(msg);
        return msg; }
  ;

lines
  : lines WORD
      { console.log('WORD ', $2) }
  | WORD
      { console.log('WORD ', $1) }
  ;

Clearly, I am very far from being done. I am currently stuck on several things all at the same time.

Things I'm stuck on

Being able to skip empty lines;
Tokenizing INPATH, SPACE, OUTPATH, and;
Using left-recursion in the language grammar section as opposed to right-recursion (What's the difference? Am I even doing it right? What's the best option here?).

In other words, I have no idea what I'm doing and could really use some help.

EDIT I'm going to attempt to do more research and hopefully eventually answer my own question.

Is this your first time using Jison or and LR parser generator? If so then I would suggest you take a few days or more and work through examples. Also start by getting the lexer to work first before moving onto the parser. It also helps to work are sub parts and get them working before attempting the entire grammar at once. This question borders on being closed as being too broad. The good thing is that it is a simple grammar and a good one to use for learning. The one main question that comes to mind when reading this is, are you validating the URLs, you don't say so in the question. — Guy Coder, Jan 22 '17 at 12:17
@GuyCoder thanks for the helpful response. Indeed, it is my first time attempting this. I'll take some time to read through more examples. I will eventually be validating the URLs, but I thought that was something I'd do later on once I could actually recognize the string itself first. Are there any specific resources you might suggest I take a look at? — adrianmcli, Jan 22 '17 at 18:30
I only used Jison for a few days to see if it could do parsing with JavaScript so I can not give a specific place to look for help. I have been writing and using parser for years so working with Jison was just a quick exercise. Sorry I can not be of more help. — Guy Coder, Jan 22 '17 at 18:43
@GuyCoder if that's the case, do you have any recommended textbooks for parsing in general? After all, I don't have to use Jison. If I can learn how to write lexers/parsers well enough with Flex/Bison, I'm sure that will get me 90% of the way there. — adrianmcli, Jan 22 '17 at 20:36
The classic recommendation is [Compilers: Principles, Techniques, and Tools (2nd Edition](https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811) It is really meant to be used in a course, but you can use it as a self learner. I learned parsing is school decades ago, before the book was even published. — Guy Coder, Jan 22 '17 at 21:11
Thanks, I'll do some reading and hopefully eventually post an answer to my original question. You've been great help @GuyCoder, thank you. — adrianmcli, Jan 22 '17 at 23:14

Tokenizing and parsing pairs of strings with Jison (or Bison)

The goal

What I have done so far

Things I'm stuck on

0 Answers0