Questions tagged [lexer]

A program converting a sequence of characters into a sequence of tokens

A lexer is a program whose purpose is the conversion of a sequence of characters into a sequence of tokens. It is also often referred to as a scanner. A lexer often exists as a single function, which is called by a parser or another function.

1050 questions
10
votes
2 answers

Parsing optional semicolon at statement end

I was writing a parser to parse C-like grammars. First, it could now parse code like: a = 1; b = 2; Now I want to make the semicolon at the end of line optional. The original YACC rule was: stmt: expr ';' { ... } Where the new line is processed by…
shouya
  • 2,863
  • 1
  • 24
  • 45
9
votes
2 answers

Writing a custom Xtext/ANTLR lexer without a grammar file

I'm writing an Eclipse/Xtext plugin for CoffeeScript, and I realized I'll probably need to write a lexer for it by hand. CoffeeScript parser also uses a hand-written lexer to handle indentation and other tricks in the grammar. Xtext generates a…
Adam Schmideg
  • 10,590
  • 10
  • 53
  • 83
9
votes
1 answer

ANTLR Parser with manual lexer

I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr. Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to…
luiscubal
  • 24,773
  • 9
  • 57
  • 83
9
votes
1 answer

Parser vs. lexer and XML

I'm reading about compilers and parsers architecture now and I wonder about one thing... When you have XML, XHTML, HTML or any SGML-based language, what would be the role of a lexer here and what would be the tokens? I've read that tokens are like…
SasQ
  • 14,009
  • 7
  • 43
  • 43
9
votes
1 answer

Using menhir with sedlex

I need to use menhir with sedlex for whatever reason (utf-8), but don't know how to make the generated parser depend on Sedlexing instead of Lexing. Any tips? When I run menhir --infer parser.mly the generated program has lines with Lexing.... I…
Olle Härstedt
  • 3,799
  • 1
  • 24
  • 57
9
votes
4 answers

ANTLR grammar: parser- and lexer literals

What's the difference between this grammar: ... if_statement : 'if' condition 'then' statement 'else' statement 'end_if'; ... and this: ... if_statement : IF condition THEN statement ELSE statement END_IF; ... IF : 'if'; THEN: 'then'; ELSE:…
BB.
  • 113
  • 3
9
votes
0 answers

What is the easiest way to extract all string literals from a C# file?

I need to extract all string literals from a given C# file. All conditional compilation constants (e.g. #if DEBUG) are assumed to be false, and the file can be assumed to be syntactically correct. Both single-line ("a\u1000b") and verbatim…
Nik Z.
  • 309
  • 2
  • 4
9
votes
2 answers

Matching arbitrary text (both symbols and spaces) with ANTLR?

How to match any text in ANTLRv4? I mean text, which is unknown at the time of grammar writing? My grammar is follows: grammar Anytext; line : comment; comment : '#' anytext; anytext: ANY*; WS : [ \t\r\n]+; ANY : .; And my code is…
Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385
9
votes
2 answers

How to manually construct an AST?

I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception…
horseyguy
  • 29,455
  • 20
  • 103
  • 145
8
votes
3 answers

Controlling Python PLY lexer states from parser

I am working on a simple SQL select like query parser and I need to be able to capture subqueries that can occur at certain places literally. I found lexer states are the best solution and was able to do a POC using curly braces to mark the start…
haridsv
  • 9,065
  • 4
  • 62
  • 65
8
votes
0 answers

Elegant way to parse "line splices" (backslashes followed by a newline) in megaparsec

for a small compiler project we are currently working on implementing a compiler for a subset of C for which we decided to use Haskell and megaparsec. Overall we made good progress but there are still some corner cases that we cannot correctly…
Chirs
  • 567
  • 2
  • 15
8
votes
4 answers

How to efficently build an interpreter (lexer+parser) in C?

I'm trying to make a meta-language for writing markup code (such as xml and html) which can be directly embedded into C/C++ code. Here is a simple sample written in this language, I call it WDI (Web Development Interface): /* * Simple wdi/html…
Rizo
  • 3,003
  • 5
  • 34
  • 49
8
votes
2 answers

How would I go about Implementing A Simple Stack-Based Programming Language

I am interested in extending my knowledge of computer programming by implementing a stack-based programming language. I am seeking out advice on where to begin, as I intend for it to have functions like "pushint 1" which would push an integer with…
jszaday
  • 322
  • 1
  • 4
  • 12
7
votes
1 answer

Characters Matching Multiple Lexer Rules in ANTLR

I've defined multiple lexer rules that potentially matches the same character sequence. For example: LBRACE: '{' ; RBRACE: '}' ; LPARENT: '(' ; RPARENT: ')' ; LBRACKET: '[' ; RBRACKET: ']' ; SEMICOLON: ';' ; ASTERISK: '*' ; AMPERSAND: '&' …
JavaMan
  • 4,954
  • 4
  • 41
  • 69
7
votes
7 answers

Lexer/parser tools

Which lexer/parser generator is the best (easiest to use, fastest) for C or C++? I'm using flex and bison right now, but bison only handles LALR(1) grammars. The language I'm parsing doesn't really need unlimited lookahead, but unlimited lookahead…
Zifre
  • 26,504
  • 11
  • 85
  • 105