Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions

votes

1 answer

How to efficiently implement longest match in a lexer generator?

I'm interested in learning how to write a lexer generator like flex. I've been reading "Compilers: Principles, Techniques, and Tools" (the "dragon book"), and I have a basic idea of how flex works. My initial approach is this: the user will supply a…

asked Jun 01 '14 at 05:32

gsgx

12,020
25
98
149

votes

2 answers

How do I implement a two-pass scanner using Flex?

As a pet-project, I'd like to attempt to implement a basic language of my own design that can be used as a web-scripting language. It's trivial to run a C++ program as an Apache CGI, so the real work lies in how to parse an input file containing…

parsing bison flex-lexer lexical-analysis

asked Sep 19 '08 at 19:51

dmercer

votes

1 answer

How to parse a tab-separated line of text in Ruby?

I find Ruby's each function a bit confusing. If I have a line of text, an each loop will give me every space-delimited word rather than each individual character. So what's the best way of retrieving sections of the string which are delimited by a…

ruby parsing delimiter lexical-analysis csv

asked Mar 03 '09 at 10:03

alamodey

14,320
24
86
112

votes

3 answers

FLEX: Is there a way to return multiple tokens at once

In flex, I want to return multiple tokens for one match of a regular expression. Is there a way to do this?

regex flex-lexer lexical-analysis

asked Feb 22 '09 at 09:11

Eburetto

votes

4 answers

How to turn a token stream into a parse tree

I have a lexer built that streams out tokens from in input but I'm not sure how to build the next step in the process - the parse tree. Does anybody have any good resources or examples on how to accomplish this?

parsing token lexical-analysis parse-tree

asked Jan 19 '09 at 04:47

Evan Fosmark

98,895
36
105
117

votes

2 answers

Syntactic predicates in ANTLR lexer rules

Introduction Looking at the documentation, ANTLR 2 used to have something called predicated lexing, with examples like this one (inspired by Pascal): RANGE_OR_INT : ( INT ".." ) => INT { $setType(INT); } | ( INT '.' ) => REAL {…

antlr antlr4 regex-lookarounds lexical-analysis

asked Mar 01 '16 at 13:20

MvG

57,380
22
148
276

votes

1 answer

Which special characters must be escaped when using Python regex module re?

I'm using the Python module re to write regular expressions for lexical analysis. I've been looking for a comprehensive list of which special characters must be escaped in order to be recognized by the regex to no avail. Can someone please point…

python regex escaping lexical-analysis

asked Jan 11 '16 at 01:27

Victor Brunell

5,668
10
30
46

votes

1 answer

Return multiple tokens in ocamllex

Is there any way to return multiple tokens in OCamlLex? I'm trying to write a lexer and parser for an indentation based language, and I would like my lexer to return multiple DEDENT tokens when it notices that the indentation level is less than it…

ocaml lexical-analysis ocamllex ocamlyacc

asked Aug 09 '10 at 06:42

Joe Bloggs

votes

2 answers

Prolog DCG: Writing programming language lexer

I'm trying for the moment to keep my lexer and parser separate, based on the vague advice of the book Prolog and Natural Language Analysis, which really doesn't go into any detail about lexing/tokenizing. So I am giving it a shot and seeing several…

prolog lexical-analysis dcg

asked Dec 14 '15 at 23:05

Daniel Lyons

22,421
2
50
77

votes

1 answer

Character position in scanner using Lex/Flex

In Lex/Flex is there a way to get the position in the character stream (from the start of the file) that a token appears at? Kind of like yylineno except that it returns the character position as an integer? If not, what's the best way to get at…

parsing compiler-construction lexical-analysis

asked Feb 14 '10 at 04:09

ChrisDiRulli

1,482
8
19
28

votes

3 answers

How to recognize words in text with non-word tokens?

I am currently parsing a bunch of mails and want to get words and other interesting tokens out of mails (even with spelling errors or combination of characters and letters, like "zebra21" or "customer242"). But how can I know that…

algorithm nlp lexical-analysis

asked Jan 03 '10 at 12:53

zebra

1,330
1
13
26

votes

3 answers

Lexical Analysis of Python Programming Language

Does anyone know where a FLEX or LEX specification file for Python exists? For example, this is a lex specification for the ANSI C programming language: http://www.quut.com/c/ANSI-C-grammar-l-1998.html FYI, I am trying to write code highlighting…

python syntax-highlighting lex lexical-analysis

asked Nov 14 '09 at 00:25

pokstad

3,411
3
30
39

votes

3 answers

Parsing Python function calls to get argument positions

I want code that can analyze a function call like this: whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs) And return the positions of each and every argument, in this case foo, baz(), 'puppet', 24+2, meow=3, *meowargs,…

python syntax lexical-analysis

asked May 19 '13 at 13:38

Ram Rachum

84,019
84
236
374

votes

1 answer

Are there any off-the-shelf solutions for lexical analysis in Haskell that allow for a run-time dynamic lexicon?

I'm working on a small Haskell project that needs to be able to lex a very small subset of strictly formed English in to tokens for semantic parsing. It's a very naïve natural language interface to a system with many different end effectors than…

haskell nlp lexical-analysis alex

asked Feb 07 '13 at 21:01

Doug Stephen

7,181
1
38
46

votes

2 answers

Why won't Parsec consider the right-hand side of my <|> alternative?

I’m trying to parse C++ code. Therefore, I need a context-sensitive lexer. In C++, >> is either one or two tokens (>> or > >), depending on the context. To make it even more complex, there is also a token >>= which is always the same regardless of…

haskell lexical-analysis parsec

asked Dec 09 '12 at 14:29

user142019

Prev 1 2 3

…

56 57 Next