JFlex maximum read length

Question

Given a positional language like the old IBM RPG, we can have a line such as

CCCCCDIDENTIFIER     E S             10

Where characters

 1-5:  comment
   6:  specification type
7-21:  identifier name
...And so on

Now, given that JFlex is based on RegExp, we would have a RegExp such as:

[a-zA-Z][a-zA-Z0-9]{0,14} {0,14}

for the identifier name token.
This RegExp however can match tokens longer than the 15 characters possible for identifier name, requiring yypushbacks.

Thus, is there a way to limit how many characters JFlex reads for a particular token?

rici · Accepted Answer · 2021-05-17T03:53:45.033

Regular expression based lexical analysis is really not the right tool to parse fixed-field inputs. You can just split the input into fields at the known character positions, which is way easier and a lot faster. And it doesn't require fussing with regular expressions.

Anyway, [a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14} wouldn't be the right expression even if it did properly handle the token length, since the token is really the word at the beginning, without space characters.

In the case of fixed-length fields which contain something more complicated than a single identifier, you might want to feed the field into a lexer, using a StringReader or some such.

Although I'm sure it's not useful, here's a regular expression which matches 15 characters which start with a word and are completed with spaces:

[a-zA-Z][ ]{14} |
[a-zA-Z][a-zA-Z0-9][ ]{13} |
[a-zA-Z][a-zA-Z0-9]{2}[ ]{12} |
[a-zA-Z][a-zA-Z0-9]{3}[ ]{11} |
[a-zA-Z][a-zA-Z0-9]{4}[ ]{10} |
[a-zA-Z][a-zA-Z0-9]{5}[ ]{9} |
[a-zA-Z][a-zA-Z0-9]{6}[ ]{8} |
[a-zA-Z][a-zA-Z0-9]{7}[ ]{7} |
[a-zA-Z][a-zA-Z0-9]{8}[ ]{6} |
[a-zA-Z][a-zA-Z0-9]{9}[ ]{5} |
[a-zA-Z][a-zA-Z0-9]{10}[ ]{4} |
[a-zA-Z][a-zA-Z0-9]{11}[ ]{3} |
[a-zA-Z][a-zA-Z0-9]{12}[ ]{2} |
[a-zA-Z][a-zA-Z0-9]{13}[ ] |
[a-zA-Z][a-zA-Z0-9]{14}

(That might have to be put on one very long line.)

The idea was to use JFlex to handle the two variants of RPG, positional and free form. But like you said, it's not practical in the end. I had already started reading line by line, splitting, lexing the tokens and putting them on a queue, which is polled each time the lexer advance. I'm not an expert so I was looking for a knowledgeable opinion. Thanks! — LppEdd, May 17 '21 at 05:12

JFlex maximum read length

1 Answers1