1

I am trying to implement an include feature in the lexer so that when it hits '#include "filename"' it will switch to a stream of that file. I got it working using a lexer action shown below. When I run it it seg faults.

antlr4::ANTLRInputStream new_source(new_file); // new file is an open ifstream

int pos = _input->index();

filestack.push(std::make_pair(_input,pos)); //my stack to keep track of previous files
reset();
_input= static_cast<antlr4::CharStream*>(&new_source);

I checked that static_cast<> works and returns a non null pointer, and the assignment is successful. However, when it continues on it segfaults after it goes into the recompiled ANLTR runtime. Is there something I'm missing?

UPDATE: I just recompiled the c++ runtime with debug flags on, and now I see it's failing at LexerATNSimulator::failOrAccept when it returns _prevAccept.dfaState->prediction.

Also, this is what happens before the segfault:

It exits out of the custom lexer action and the LexerActionExecutor.
It enters LexerATNSimulator::accept.
exits LexerATNSimulator::accept.
Enters LexerATNSimulator::failOrAccept
Segfault

I am resetting the lexer when switching over, could that have some to do with the failure?

MikhailS
  • 25
  • 5

1 Answers1

1

Just replacing the value of the input stream won't cut it. There are references here and there which can lead to crashes. Instead you have to reset the lexer + token source. It goes like this:

lexer.reset();
lexer.setInputStream(&input); // Not just reset(), which only rewinds the current position.
tokens.setTokenSource(&lexer);

See the MySQL Workbench code on Github for the full the full code.

Regarding the token source: the lexer is the token source and all you can do is to call .reset(). Look in the C++ runtime source for the details of this function.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • It says it cant find errors nor tokens from inside the lexer action. Where is errors defined? Also, setInputStream(&new_file) is composed of only a reset() and the change of _input. – MikhailS Aug 03 '18 at 21:01
  • Even in the main function, errors does not seem to exist... can you point me to the source? Also, what would be the best way to reset TokenSource internals from the lexer? – MikhailS Aug 10 '18 at 19:12
  • Ah, that was just a demonstration. Let me remove the reference to `errors` to avoid irritations. – Mike Lischke Aug 11 '18 at 08:43
  • I understand, however this is for something at the top scope of the full antlr parser. However, I'm looking for something fully within the scope of the lexer. Overall, this is to have #include functionality from the lexer, so that when it sees a '#include "filename"' it would change the token source inside the lexer and make it seem like it's a long file (albeit with different file names in the ANTLRFileStream for the setting). – MikhailS Aug 13 '18 at 16:41