Questions tagged [lexer]

A program converting a sequence of characters into a sequence of tokens

A lexer is a program whose purpose is the conversion of a sequence of characters into a sequence of tokens. It is also often referred to as a scanner. A lexer often exists as a single function, which is called by a parser or another function.

1050 questions
12
votes
2 answers

Is it bad idea using regex to tokenize string for lexer?

I'm not sure how am I gonna tokenize source for lexer. For now, I only can think of using regex to parse string into array with given rule (identifier, symbols such as +,-, etc). For instance, begin x:=1;y:=2; then I want to tokenize word, variable…
REALFREE
  • 4,378
  • 7
  • 40
  • 73
12
votes
2 answers

Nested generic syntax ambiguity >>

Apparently, C# is as susceptible to '>>' lexer dilemma as is C++. This C# code is pretty valid, it compiles and runs just fine: var List = new Dummy("List"); var Nullable = new Dummy("Nullable"); var Guid = new Dummy("Guid"); var x =…
Oleg Mihailik
  • 2,514
  • 2
  • 19
  • 32
11
votes
2 answers

How does the C/C++ compiler distinguish the uses of the * operator (pointer, dereference operator, multiplication operator)?

How, in C and C++ languages, can the compiler distinguish * when used as a pointer (MyClass* class) and when used as a multiply operator (a * b) or when is a dereferencing operator (*my_var)?
Pinnaker
  • 172
  • 13
11
votes
4 answers

Writing a code formatting tool for a programming language

I'm looking into the feasibility of writing a code formatting tool for the Apex language, a Salesforce.com variation on Java, and perhams VisualForce, its tag based markup language. I have no idea on where to start this, apart from feeling/knowing…
Steven Herod
  • 764
  • 8
  • 20
11
votes
3 answers

Where should I draw the line between lexer and parser?

I'm writing a lexer for the IMAP protocol for educational purposes and I'm stumped as to where I should draw the line between lexer and parser. Take this example of an IMAP server response: * FLAGS (\Answered \Deleted) This response is defined in…
duck9
  • 406
  • 3
  • 18
11
votes
1 answer

How does the ANTLR lexer disambiguate its rules (or why does my parser produce "mismatched input" errors)?

Note: This is a self-answered question that aims to provide a reference about one of the most common mistakes made by ANTLR users. When I test this very simple grammar: grammar KeyValues; keyValueList: keyValue*; keyValue: key=IDENTIFIER '='…
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
11
votes
1 answer

Lexical Analyser In Java

I have been trying to write a simple lexical analyzer in java . The File Token.java looks as follows : import java.util.regex.Matcher; import java.util.regex.Pattern; public enum Token { TK_MINUS ("-"), TK_PLUS ("\\+"), TK_MUL…
Vicky
  • 113
  • 1
  • 1
  • 5
11
votes
1 answer

How to parse template languages in Ragel?

I've been working on a parser for simple template language. I'm using Ragel. The requirements are modest. I'm trying to find [[tags]] that can be embedded anywhere in the input string. I'm trying to parse a simple template language, something that…
Tobias Lütke
  • 1,028
  • 1
  • 9
  • 9
11
votes
3 answers

Lexer/parser to generate Scala code from BNF grammar

I'm currently looking for a lexer/parser that generates Scala code from a BNF grammar (an ocamlyacc file with precedence and associativity). I'm quite confused since I found almost nothing on how to do it. For parsing, I found scala-bison (that I…
Vinz
  • 5,997
  • 1
  • 31
  • 52
11
votes
1 answer

ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

I am using Antlr4 and java7 grammar (source) for modifying an input Java Source file. More specifically, I am using the TokenStreamRewriter class to modify some tokens. The following code is a sample that shows how the tokens are modified: …
Mike B
  • 1,522
  • 1
  • 14
  • 24
11
votes
6 answers

Parser How To in .NET

I'd like to understand how to construct a parser in .NET to process source files. For example, maybe I could begin by learning how to parse SQL or HTML or CSS and then act on the results to be able to format them for readability or something…
Rudy
  • 920
  • 9
  • 19
10
votes
1 answer

how to write custom InlineLexer rule for marked.js?

With Marked I can easily override/add/change lexer rules during implementation, and its great! For example I can force to use space between hash sign an text to make a header like this: var lexer = new…
Alexander Arutinyants
  • 1,619
  • 2
  • 23
  • 49
10
votes
1 answer

Is there a working C++ grammar file for ANTLR?

Are there any existing C++ grammar files for ANTLR? I'm looking to lex, not parse some C++ source code files. I've looked on the ANTLR grammar page and it looks like there is one listed created by Sun Microsystems here. However, it seems to be a…
c14ppy
  • 247
  • 3
  • 6
10
votes
3 answers

How should I handle lexical errors in my Flex lexer?

I'm currently trying to write a small compiler using Flex+Bison but I'm kinda of lost in terms of what to do with error handlling, specially how to make everything fit together. To motivate the discussion consider the following lexer fragment I'm…
hugomg
  • 68,213
  • 24
  • 160
  • 246
10
votes
1 answer

Non-left-recursive PEG grammar for an "expression"

It's either a simple identifier (like cow) something surrounded by brackets ((...)) something that looks like a method call (...(...)) or something that looks like a member access (thing.member): def expr = identifier | "(" ~> expr <~…
Li Haoyi
  • 15,330
  • 17
  • 80
  • 137
1 2
3
69 70