Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions
6
votes
1 answer

Order of precedence for token matching in Flex

My apologies if the title of this thread is a little confusing. What I'm asking about is how does Flex (the lexical analyzer) handle issues of precedence? For example, let's say I have two tokens with similar regular expressions, written in the…
Casey Patton
  • 4,021
  • 9
  • 41
  • 54
6
votes
2 answers

How to ignore comments inside string literals

I'm doing a lexer as a part of a university course. One of the brain teasers (extra assignments that don't contribute to the scoring) our professor gave us is how could we implement comments inside string literals. Our string literals start and end…
Konsta
  • 79
  • 1
  • 8
6
votes
3 answers

How to make C language context-free?

I know that C is not a context-free language, a famous example is: int foo; typedef int foo; foo x; In this case the lexer doesn't know, whether foo in the 3rd line, is an identifier, or typedef. My question is, is this the only reason that makes C…
Bite Bytes
  • 1,455
  • 8
  • 24
6
votes
1 answer

How to use yylval in flex

I'm trying to build a lexical analyser with FLEX on windows. I'm getting always an error: "undefined reference to `yylval'" I declared yylval as a extern type up where all definitions are made as follows: %option noyywrap %{ …
ofer gertz
  • 89
  • 1
  • 6
6
votes
2 answers

Using flex in c and regular expressions

I am trying to create a lexical analyzer for a compiler.But I have a problem using regular expressions to find things like keywords and real numbers.. for example some definitions : id [aA-zZ][aA-zZ-0-9_]* keyword …
6
votes
4 answers

How can I modify the text of tokens in a CommonTokenStream with ANTLR?

I'm trying to learn ANTLR and at the same time use it for a current project. I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is…
mmcdole
  • 91,488
  • 60
  • 186
  • 222
6
votes
4 answers

Is string "1a" an error for lexical analyser or not?

I am making a basic lexical analyser in Java for my semester project and I am at conflict on a concept with my subject teacher. My view is that in general if an input like "1a" is given to lexical analyser then it should give output as:…
Cheeta
  • 429
  • 1
  • 8
  • 17
6
votes
2 answers

jFlex error: class throws java.io.IOException

I have written a very simple file with specification shown below to to tokenize words: %% %class Lexer %unicode WORD = [^\r\n\t ] %% {WORD} {System.out.println("Word is:"+yytext());} . {System.out.println("Bad character: "+…
Aman Deep Gautam
  • 8,091
  • 21
  • 74
  • 130
6
votes
4 answers

How can I use text analysis in order to investigate questionnaire responses?

I'm the "programmer" of a team of pupils that aims to investigate satisfaction and general problems in my grammar school. We have a questionary that is built upon a scale from 1-6 and we interpret these answers by a diagram software that I wrote in…
6
votes
2 answers

PHP code analyzer to determine classes/extensions used

Problem I have a legacy codebase I need to analyze and determine dependencies. Particularly the dependencies on classes (internal/external) and extensions (Memcache, PDO, etc). What I've Tried I have reviewed the tools listed in Is there a static…
Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
6
votes
1 answer

Haskell Parsec - error messages are less helpful while using custom tokens

I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens. Here are some examples of Parsec's error messages while using…
sinan
  • 6,809
  • 6
  • 38
  • 67
6
votes
1 answer

Get character offsets for elements in jsoup

I need to map jsoup elements back to specific character offsets in the source HTML. In other words, if I have HTML that looks like this: Hello
World I need to know that "Hello " starts at offset 0 and has a length of 6 characters,
ccleve
  • 15,239
  • 27
  • 91
  • 157
6
votes
3 answers

Start states in Lex / Flex

I'm using Flex and Bison for a parser generator, but having problems with the start states in my scanner. I'm using exclusive rules to deal with commenting, but this grammar doesn't seem to match quoted tokens: %x COMMENT // {…
Dan
  • 33,953
  • 24
  • 61
  • 87
6
votes
5 answers

Simplify regular expression for time literals (like "10h50m")

I am writing lexer rules for a custom description language using pyLR1 which shall include time literals like for example: 10h30m # meaning 10 hours + 30 minutes 5m30s # meaning 5 minutes + 30 seconds 10h20m15s # meaning 10 hours + 20…
Jonas Schäfer
  • 20,140
  • 5
  • 55
  • 69
5
votes
1 answer

error handling in YACC

hi there i'm trying to make a simple parser and using lex and yacc. the thing is i wanna print my own error messages rather than error symbol used by yacc which prints syntax error. for example this is my yacc code; %{ #include #include…
quartaela
  • 2,579
  • 16
  • 63
  • 99