Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions
4
votes
2 answers

Is it possible to call C# lexical/syntactic analyzers without compilation?

Considering this question of SO, where whole C# in-memory compiler is being called. When only lexical and syntactic analyzing is required: parse text as a stream of lexemes, check them and exit. Is it possible in current version of…
abatishchev
  • 98,240
  • 88
  • 296
  • 433
4
votes
1 answer

Real life usage of languages analysis?

I could easily use parts of compiler (e.g scanning, parsing, syntax analysis) to write my own compiler, or code analyzer (like generating class diagrams and other)... but there are some other uses of those algorithms and tools (except from natural…
4
votes
2 answers

Are back references possible in flex (lexical analyser)?

I'm used to play with regexp in languages where I can use parenthesis to capture references. The only thing near that in flex that I'm seeing is the yytext variable. But it's contents are the full matched regexp and not just some part of it. Isn't…
rfgamaral
  • 16,546
  • 57
  • 163
  • 275
4
votes
3 answers

Best way to implement a meta language compiling down to PHP

I've been working on the specification / kitchensink for a meta language that can compile down to PHP for some time now. Now I want to begin building the thing. Before I have implemented tiny DSL's using PHP_Lexergenerator and PHP_Parsergenerator…
Rune Kaagaard
  • 6,643
  • 2
  • 38
  • 29
4
votes
4 answers

Lexers/tokenizers and character sets

When constructing a lexer/tokenizer is it a mistake to rely on functions (in C) such as isdigit/isalpha/...? They are dependent on locale as far as I know. Should I pick a character set and concentrate on it and make a character mapping myself from…
4
votes
2 answers

How to store tokens while lexical analysis

I'm trying to design a compiler, and am at lexical analysis. Say I take a simple "Hello World!" program as a file of strings and extract tokens from it. What is the best way to store these tokens? In a single data structure, or two or more data…
GothamCityRises
  • 2,072
  • 2
  • 27
  • 43
4
votes
2 answers

Could not load main class in JavaCC

I am AI student and we work with JavaCC. I am new with it. I was trying simple example and I had some errors. 1) I downloaded JavaCC 0.6 from it's website 2) I extracted it in disc C 3) I wrote this code in a file with extension…
user2970269
  • 43
  • 1
  • 4
4
votes
1 answer

semi-reserved words handling in flex/bison

Consider this lex.l file: %{ #include "y.tab.h" %} digit [0-9] letter [a-zA-Z] %% "+" { return PLUS; } "-" { return MINUS; } "*" { return TIMES; } "/" …
Vardan Hovhannisyan
  • 1,101
  • 3
  • 17
  • 40
4
votes
1 answer

Possible typos in ECMAScript 5 specification?

Does anybody know why, at the end of section 7.6 of the ECMA-262, 5th Edition specification, the nonterminals UnicodeLetter, UnicodeCombiningMark, UnicodeDigit, UnicodeconnectorPunctuation, and UnicodeEscapeSequence are not followed by two…
Andy West
  • 12,302
  • 4
  • 34
  • 52
4
votes
1 answer

Elimination left recursion for E := EE+|EE-|id

How to eliminate left recursion for the following grammar? E := EE+|EE-|id Using the common procedure: A := Aa|b translates to: A := b|A' A' := ϵ| Aa Applying this to the original grammar we get: A = E, a = (E+|E-) and b = id Therefore: E :=…
4
votes
1 answer

Parse a string by Lexing.from_string

I have implemented this example, and it works well. Now, I want to read from a string instead of reading from stdin, so I change the calc.ml: let _ = try let lexbuf = Lexing.from_string "1+3" in let result = Parser.main Lexer.token lexbuf…
SoftTimur
  • 5,630
  • 38
  • 140
  • 292
4
votes
2 answers

Lexical analyzer (java) for HTML Markdown source code

I do not even know where to begin writing the character-by-character lexical analyzer. I wrote BNF grammar rules for a Markdown language (specifically, HTML) based on rules and specifics I was given, so none should need to be added. I now have to…
4
votes
1 answer

DFA based regular expression matching - how to get all matches?

I have a given DFA that represent a regular expression. I want to match the DFA against an input stream and get all possible matches back, not only the leastmost-longest match. For example: regex: a*ba|baa input:…
youllknow22
4
votes
2 answers

How to unlex using Flex(The Fast Lexical Analyzer)?

Is there any way to put a token back into the input stream using Flex? I imagine some function like yyunlex().
mrk
  • 3,061
  • 1
  • 29
  • 34
4
votes
2 answers

How does compiler handle line number in runtime error message

Almost all compiler will return a line number along with error message. I am wondering in compiler design perspective how does compiler handle line number message in terms of following different phases? thanks. Scanner Parser AST data…
Simon Guo
  • 2,776
  • 4
  • 26
  • 35