Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions

votes

1 answer

Order of precedence for token matching in Flex

My apologies if the title of this thread is a little confusing. What I'm asking about is how does Flex (the lexical analyzer) handle issues of precedence? For example, let's say I have two tokens with similar regular expressions, written in the…

tokenize flex-lexer lexical-analysis

asked Jul 18 '11 at 17:11

Casey Patton

4,021
9
41
54

votes

2 answers

How to ignore comments inside string literals

I'm doing a lexer as a part of a university course. One of the brain teasers (extra assignments that don't contribute to the scoring) our professor gave us is how could we implement comments inside string literals. Our string literals start and end…

python regex lexical-analysis ply

asked Oct 05 '20 at 14:37

Konsta

votes

3 answers

How to make C language context-free?

I know that C is not a context-free language, a famous example is: int foo; typedef int foo; foo x; In this case the lexer doesn't know, whether foo in the 3rd line, is an identifier, or typedef. My question is, is this the only reason that makes C…

c parsing compiler-construction lexical-analysis

asked May 23 '17 at 17:45

Bite Bytes

1,455
8
24

votes

1 answer

How to use yylval in flex

I'm trying to build a lexical analyser with FLEX on windows. I'm getting always an error: "undefined reference to `yylval'" I declared yylval as a extern type up where all definitions are made as follows: %option noyywrap %{ …

flex-lexer lex lexical-analysis

asked Apr 13 '17 at 15:21

ofer gertz

votes

2 answers

Using flex in c and regular expressions

I am trying to create a lexical analyzer for a compiler.But I have a problem using regular expressions to find things like keywords and real numbers.. for example some definitions : id [aA-zZ][aA-zZ-0-9_]* keyword …

c regex compiler-construction flex-lexer lexical-analysis

asked Feb 06 '15 at 14:52

user2241915

votes

4 answers

How can I modify the text of tokens in a CommonTokenStream with ANTLR?

I'm trying to learn ANTLR and at the same time use it for a current project. I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is…

compiler-construction antlr antlr3 lexical-analysis

asked Feb 09 '10 at 11:57

mmcdole

91,488
60
186
222

votes

4 answers

Is string "1a" an error for lexical analyser or not?

I am making a basic lexical analyser in Java for my semester project and I am at conflict on a concept with my subject teacher. My view is that in general if an input like "1a" is given to lexical analyser then it should give output as:…

java programming-languages lexical-analysis

asked May 29 '13 at 16:52

Cheeta

votes

2 answers

jFlex error: class throws java.io.IOException

I have written a very simple file with specification shown below to to tokenize words: %% %class Lexer %unicode WORD = [^\r\n\t ] %% {WORD} {System.out.println("Word is:"+yytext());} . {System.out.println("Bad character: "+…

java lexical-analysis jflex

asked Apr 02 '13 at 20:08

Aman Deep Gautam

8,091
21
74
130

votes

4 answers

How can I use text analysis in order to investigate questionnaire responses?

I'm the "programmer" of a team of pupils that aims to investigate satisfaction and general problems in my grammar school. We have a questionary that is built upon a scale from 1-6 and we interpret these answers by a diagram software that I wrote in…

python statistics computer-science lexical-analysis text-analysis

asked Dec 09 '12 at 10:26

Simon F

votes

2 answers

PHP code analyzer to determine classes/extensions used

Problem I have a legacy codebase I need to analyze and determine dependencies. Particularly the dependencies on classes (internal/external) and extensions (Memcache, PDO, etc). What I've Tried I have reviewed the tools listed in Is there a static…

php code-analysis lexical-analysis

asked Oct 26 '12 at 14:17

Jason McCreary

71,546
23
135
174

votes

1 answer

Haskell Parsec - error messages are less helpful while using custom tokens

I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens. Here are some examples of Parsec's error messages while using…

parsing haskell lexical-analysis parsec

asked Aug 28 '12 at 20:50

sinan

6,809
6
38
67

votes

1 answer

Get character offsets for elements in jsoup

I need to map jsoup elements back to specific character offsets in the source HTML. In other words, if I have HTML that looks like this: Hello
World I need to know that "Hello " starts at offset 0 and has a length of 6 characters,
…

jsoup lexical-analysis

asked Jul 08 '12 at 23:09

ccleve

15,239
27
91
157

votes

3 answers

Start states in Lex / Flex

I'm using Flex and Bison for a parser generator, but having problems with the start states in my scanner. I'm using exclusive rules to deal with commenting, but this grammar doesn't seem to match quoted tokens: %x COMMENT // {…

parsing bison lex lexical-analysis flex-lexer

asked Jul 15 '09 at 10:27

Dan

33,953
24
61
87

votes

5 answers

Simplify regular expression for time literals (like "10h50m")

I am writing lexer rules for a custom description language using pyLR1 which shall include time literals like for example: 10h30m # meaning 10 hours + 30 minutes 5m30s # meaning 5 minutes + 30 seconds 10h20m15s # meaning 10 hours + 20…

regex parsing time lexical-analysis

asked Jul 02 '12 at 11:49

Jonas Schäfer

20,140
5
55
69

votes

1 answer

error handling in YACC

hi there i'm trying to make a simple parser and using lex and yacc. the thing is i wanna print my own error messages rather than error symbol used by yacc which prints syntax error. for example this is my yacc code; %{ #include #include…

parsing yacc lex lexical-analysis

asked Mar 20 '12 at 23:34

quartaela

2,579
16
63
99

Prev 1 2 3

…

56 57 Next