Questions tagged [lexer]

A program converting a sequence of characters into a sequence of tokens

A lexer is a program whose purpose is the conversion of a sequence of characters into a sequence of tokens. It is also often referred to as a scanner. A lexer often exists as a single function, which is called by a parser or another function.

1050 questions
23
votes
4 answers

ANTLR What is simpliest way to realize python like indent-depending grammar?

I am trying realize python like indent-depending grammar. Source example: ABC QWE CDE EFG EFG CDE ABC QWE ZXC As i see, what i need is to realize two tokens INDENT and DEDENT, so i could write something like: grammar mygrammar; text: (ID…
Astronavigator
  • 2,021
  • 2
  • 24
  • 45
21
votes
5 answers

Good parser generator (think lex/yacc or antlr) for .NET? Build time only?

Is there a good parser generator (think lex/yacc or antlr) for .NET? Any that have a license that would not scare lawyers? Lot’s of LGPL but I am working on embedded components and some organizations are not comfortable with me taking an LGPL…
Eric Schoonover
  • 47,184
  • 49
  • 157
  • 202
19
votes
5 answers

Generate AST of a PHP source file

I want to parse a PHP source file, into an AST (preferably as a nested array of instructions). I basically want to convert things like f($a, $b + 1) into something like array( 'function_call', array( array( 'var', '$a' ), array(…
Dogbert
  • 212,659
  • 41
  • 396
  • 397
19
votes
1 answer

ANTLR4 what does ATN stand for?

I'm surprised this is not explained anywhere on the ANTLR website nor in any of the documentation, but what does ATN (not ANT) stand for? Knowing what the acronym stands for would help me understand the role of the ATN, ATNSimulator, etc. components…
rolling_codes
  • 15,174
  • 22
  • 76
  • 112
18
votes
3 answers

Does C# have (direct) flex/yacc port? Or what lexer/parser people use for C#?

I might be wrong, but it looks like that there's no direct flex/bison (lex/yacc) port for C#/.NET so far. For LALR parser, I found GPPG/GPLEX, and for LL parser, there is the famous ANTLR. But, I want to reuse my flex/bison grammar as much as…
prosseek
  • 182,215
  • 215
  • 566
  • 871
18
votes
1 answer

How to define tokens that can appear in multiple lexical modes in ANTLR4?

I am learning ANTLR4 and was trying to play with lexical modes. How can I have the same token appear in multiple lexical modes? As a very simple example, let's say my grammar has two modes, and I want to match white space and end-of-lines in both of…
medhat
  • 303
  • 2
  • 4
17
votes
1 answer

%option noinput nounput: what are they for?

I am new in this, so I wondered why do I need to use these directives %option nounput %option noinput Yeah, I am aware that otherwise I'd have these warnings: lex.yy.c:1237:17: warning: ‘yyunput’ defined but not used [-Wunused-function] static…
zeroDivider
  • 1,050
  • 13
  • 29
15
votes
1 answer

Using ANTLR Parser and Lexer Separatly

I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine. CompilerLexer.g4: lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT :…
user2998131
  • 205
  • 1
  • 2
  • 6
14
votes
3 answers

OCaml + Menhir Compiling/Writing

I'm a complete newbie when it comes to OCaml. I've only recently started using the language (about 2 weeks ago), but unfortunately, I've been tasked with making a syntax analyzer (parser + lexer, whose function is to either accept or not a sentence)…
Lopson
  • 1,202
  • 1
  • 8
  • 20
14
votes
2 answers

Lex strings with single, double, or triple quotes

My objective is to parse like Python does with strings. Question: How to write a lex to support the following: "string..." 'string...' """multi line string \n \n end""" '''multi line string \n \n end''' Some code: states = ( ('string',…
Steve Peak
  • 2,657
  • 1
  • 17
  • 18
14
votes
2 answers

In antlr4 lexer, How to have a rule that catches all remaining "words" as Unknown token?

I have an antlr4 lexer grammar. It has many rules for words, but I also want it to create an Unknown token for any word that it can not match by other rules. I have something like this: Whitespace : [ \t\n\r]+ -> skip; Punctuation : [.,:;?!]; //…
mdakin
  • 1,310
  • 11
  • 17
13
votes
2 answers

How would you parse indentation (python style)?

How would you define your parser and lexer rules to parse a language that uses indentation for defining scope. I have already googled and found a clever approach for parsing it by generating INDENT and DEDENT tokens in the lexer. I will go deeper on…
Null303
  • 1,042
  • 1
  • 8
  • 15
13
votes
2 answers

Why is this assembly code faster?

I'm experimenting with a lexer, and I found that switching from a while-loop to an if-statement and a do-while-loop in one part of the program led to ~20% faster code, which seemed crazy. I isolated the difference in the compiler generated code to…
briangreenery
  • 673
  • 4
  • 14
12
votes
3 answers

Lexing partial SQL in C#

I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example '1' AND 1=1-- Should break down into tokens like [0] => [SQL_STRING, '1'] [1] => [SQL_AND] [2] => [SQL_INT, 1] [3] => [SQL_AND] [4] => [SQL_INT, 1] [5] =>…
Christopher Tarquini
  • 11,176
  • 16
  • 55
  • 73
12
votes
2 answers

direct-coded vs table-driven lexer?

I'm new in compiler construction world , I want to know what are the differences between direct-coded vs table-driven lexer analyzer ? Please use simple source code example if it's possible. Thanks. Edit : in Engineering a Compiler book, the author…
Ahmed T. Ali
  • 1,021
  • 1
  • 13
  • 22
1
2
3
69 70