Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions
15
votes
2 answers

premature eof error in flex file

I have the following code and it gives an error" "hello.l",line 31: premature EOF" when I run the following command flex hello.l %{ #include #include "y.tab.h" %} %% ("hi"|"oi")"\n" {return HI; } ("tchau"|"bye")"\n" …
Waseem
  • 1,392
  • 5
  • 21
  • 30
14
votes
1 answer

Why parser-generators instead of just configurable-parsers?

The title sums it up. Presumably anything that can be done with source-code-generating parser-generators (which essentially hard-code the grammar-to-be-parsed into the program) can be done with a configurable parser (which would maintain the…
Li Haoyi
  • 15,330
  • 17
  • 80
  • 137
13
votes
4 answers

How do you implement syntax highlighting?

I am embarking on some learning and I want to write my own syntax highlighting for files in C++. Can anyone give me ideas on how to go about doing this? To me it seems that when a file is opened: It would need to be parsed and decided what type…
MLS
  • 615
  • 1
  • 8
  • 12
12
votes
4 answers

PHP Lexer and Parser Generator?

I know question Lex and Yacc in PHP was asked before but 1 year ago. Is there any new mature PHP parser generator now? My searches drove me to the following ones, what do you think about them, any others? code.google.com/p/antlrphpruntime/ : The…
Nicolas Thery
  • 2,319
  • 4
  • 26
  • 36
12
votes
1 answer

Meaning of yywrap() in flex

What does this instructions mean in flex (lex) : #define yywrap() 1 and this [ \t]+$ i find it in the code below: (%% [ \t]+ putchar('_'); [ \t]+% %% input "hello world" output "hello_world" )
12
votes
5 answers

Python - lexical analysis and tokenization

I'm looking to speed along my discovery process here quite a bit, as this is my first venture into the world of lexical analysis. Maybe this is even the wrong path. First, I'll describe my problem: I've got very large properties files (in the order…
Philip Reynolds
  • 9,364
  • 3
  • 30
  • 37
12
votes
2 answers

Must &= always be interpreted as an operator?

I was coding and accidentally left out a space between a constant reference and its default value. I was surprised to see that it came up as an error in Intellisense, so I compiled it, and sure enough, it doesn't work in GCC 4.3.4, 4.5.1, or 4.7.2,…
chris
  • 60,560
  • 13
  • 143
  • 205
11
votes
2 answers

How does the C/C++ compiler distinguish the uses of the * operator (pointer, dereference operator, multiplication operator)?

How, in C and C++ languages, can the compiler distinguish * when used as a pointer (MyClass* class) and when used as a multiply operator (a * b) or when is a dereferencing operator (*my_var)?
Pinnaker
  • 172
  • 13
11
votes
5 answers

Installing flex (lexical analyzer) on Mac

Can someone tell me how I can install flex (lexical analyzer) on my Mac? I searched everywhere on google and I can't find it. I have the universal binary and I extracted it to my desktop but I have no idea where to go from here. Any help would be…
user635064
  • 6,219
  • 12
  • 54
  • 100
11
votes
2 answers

What is a regular expression for control characters?

I'm trying to match a control character in the form \^c where c is any valid character for control characters. I have this regular expression, but it's not currently working: \\[^][@-z] I think the problem lies with the fact that the caret character…
Cameron Tinker
  • 9,634
  • 10
  • 46
  • 85
11
votes
4 answers

How can I find only 'interesting' words from a corpus?

I am parsing sentences. I want to know the relevant content of each sentence, defined loosely as "semi-unique words" in relation to the rest of the corpus. Something similar to Amazon's "statistically improbable phrases", which seem to (often)…
Alex Mcp
  • 19,037
  • 12
  • 60
  • 93
11
votes
1 answer

How do Haskell compilers implement the parse-error(t) rule in practice?

The Haskell Report includes a somewhat notorious clause in the layout rules called "parse-error(t)". The purpose of this rule is to avoid forcing the programmer to write braces in single-line let expressions and similar situations. The relevant…
Aaron Rotenberg
  • 972
  • 7
  • 22
11
votes
3 answers

How does the C compiler parse the following C statement?

Consider the following lines: int i; printf("%d",i); Will the lexical analyzer go into the string to parse % and d as separate tokens, or will it parse "%d" as one token?
Yogesh Mittal
  • 187
  • 2
  • 12
11
votes
3 answers

How would you go about implementing off-side rule?

I've already written a generator that does the trick, but I'd like to know the best possible way to implement the off-side rule. Shortly: Off-side rule means in this context that indentation is getting recognized as a syntactic element. Here is the…
11
votes
4 answers

C#/.NET Lexer Generators

I'm looking for a decent lexical scanner generator for C#/.NET -- something that supports Unicode character categories, and generates somewhat readable & efficient code. Anyone know of one? EDIT: I need support for Unicode categories, not just…
Alex Lyman
  • 15,637
  • 3
  • 38
  • 42
1
2
3
56 57