Questions tagged [lexer]

A program converting a sequence of characters into a sequence of tokens

A lexer is a program whose purpose is the conversion of a sequence of characters into a sequence of tokens. It is also often referred to as a scanner. A lexer often exists as a single function, which is called by a parser or another function.

1050 questions
7
votes
5 answers

Is the word "lexer" a synonym for the word "parser"?

The title is the question: Are the words "lexer" and "parser" synonyms, or are they different? It seems that Wikipedia uses the words interchangeably, but English is not my native language so I can't be sure.
Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
7
votes
3 answers

How do I get an Antlr Parser rule to read from both default AND hidden channel

I use the normal whitespace separation into the hidden channel but I have one rule where I would like to include any whitespace for later processing but any example I have found requires some very strange manual coding. Is there no easy option to…
David Mårtensson
  • 7,550
  • 4
  • 31
  • 47
7
votes
3 answers

Standard format for concrete and abstract syntax trees

I have an idea for a hobby project which performs some code analysis and manipulation. This project will require both the concrete and abstract syntax trees of a given source file. Additionally, bi-directional references between the two trees would…
Brandon Bloom
  • 1,301
  • 10
  • 26
7
votes
2 answers

attribute references not allowed in lexer actions

I found a simple grammar to start learning ANTLR. I put it in the myGrammar.g file. here is the grammar: grammar myGrammar; /* This will be the entry point of our parser. */ eval : additionExp ; /* Addition and subtraction have the…
Ali Salehi
  • 341
  • 1
  • 4
  • 16
7
votes
2 answers

Unable to compile output of lex

When I attempt to compile the output of this trivial lex program: # lex.l integer printf("found keyword INT"); using: $ gcc lex.yy.c I get: Undefined symbols: "_yywrap", referenced from: _yylex in ccMsRtp7.o _input in ccMsRtp7.o …
dstnbrkr
  • 4,305
  • 22
  • 23
7
votes
4 answers

lexers / parsers for (un) structured text documents

There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc. It's relatively easy for a…
wilson32
  • 91
  • 1
  • 4
7
votes
2 answers

Is C++ code generation in ANTLR 3.2 ready?

I was trying hard to make ANTLR 3.2 generate parser/lexer in C++. It was fruitless. Things went well with Java & C though. I was using this tutorial to get started: http://www.ibm.com/developerworks/aix/library/au-c_plusplus_antlr/index.html When I…
Viet
  • 17,944
  • 33
  • 103
  • 135
7
votes
1 answer

How do you write a lexer parser where identifiers may begin with keywords?

Suppose you have a language where identifiers might begin with keywords. For example, suppose "case" is a keyword, but "caser" is a valid identifier. Suppose also that the lexer rules can only handle regular expressions. Then it seems that I…
BenRI
  • 724
  • 6
  • 17
7
votes
1 answer

ANTLR: Space indentation?

I want to create a very simple grammar with space indentation. Each line consists of 1 or more words but indentation like python (4 spaces or a tab is one indent) and there is no close for indentation, for example: if something cool occurs do…
Elliot Chance
  • 5,526
  • 10
  • 49
  • 80
7
votes
1 answer

most efficient way to parse this scripting language

I'm implementing an interpreter for a long-outdated text editor's scripting language, and I'm having some trouble getting a lexer to work properly. Here's an example of the problematic part of the language: T L /LOCATE ME/ C /LOCATE ME/CHANGED ME/ *…
Robbie Rosati
  • 1,205
  • 1
  • 9
  • 23
6
votes
3 answers

Determining "Mood" of Textual Phrases through Lexical Analysis

I am looking to apply scores (positive, negative or neutral) to short phrases of text. Short of parsing out emoticons and making assumptions based on their usage, I'm unsure of what else to try. Can anyone provide examples, research papers,…
Michael Wales
  • 10,360
  • 8
  • 28
  • 28
6
votes
2 answers

How can I simplify token prediction DFA?

Lexer DFA results in "code too large" error I'm trying to parse Java Server Pages using ANTLR 3. Java has a limit of 64k for the byte code of a single method, and I keep running into a "code too large" error when compiling the Java source generated…
erickson
  • 265,237
  • 58
  • 395
  • 493
6
votes
2 answers

Lexer that recognizes indented blocks

I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. I prefer to do this in Python, but C++ is also an option. Is there an open-source lexer that can help me do this easily, for example by…
Elektito
  • 3,863
  • 8
  • 42
  • 72
6
votes
2 answers

Using Alex in Haskell to make a lexer that parses Dice Rolls

I'm making a parser for a DSL in Haskell using Alex + Happy. My DSL uses dice rolls as part of the possible expressions. Sometimes I have an expression that I want to parse that looks like: [some code...] 3D6 [... rest of the code] Which should…
Zeb
  • 73
  • 3
6
votes
4 answers

Recursive Descent Parser for something simple?

I'm writing a parser for a templating language which compiles into JS (if that's relevant). I started out with a few simple regexes, which seemed to work, but regexes are very fragile, so I decided to write a parser instead. I started by writing a…
ltimer
  • 61
  • 1
  • 2