Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions
5
votes
2 answers

Arabic lemmatization and Stanford NLP

I try to make lemmatization, ie identifying the lemma and possibly the Arabic root of a verb, for example: يتصل ==> lemma (infinitive of the verb) ==> اتصل ==> root (triliteral root / Jidr thoulathi) ==> و ص ل Do you think Stanford NLP can do…
5
votes
4 answers

Responsibilities of the Lexer and the Parser

I'm currently implementing a lexer for a simple programming language. So far, I can tokenize identifiers, assignment symbols, and integer literals correctly; in general, whitespace is insignificant. For the input foo = 42, three tokens are…
Marius Schulz
  • 15,976
  • 12
  • 63
  • 97
5
votes
1 answer

Bison-Flex extern FILE *yyin isn't working (C language)

I know that in flex you just have to do yyin = fopen(filename, "r"); to read a file but if you want to do it from bison how is it possible? I'm trying to combine flex and bison for my purpose(read a file with 4 + 5 + 7; and print the outcome) but I…
captain monk
  • 719
  • 4
  • 11
  • 34
5
votes
1 answer

Flex/bison syntax error

I am trying to write a grammar which will be able to consume the following input: begin #this is a example x = 56; while x > 0 do begin point 15.6 78.96; end; end; Here is the lexer.l file: %option noyywrap %{ #include…
Vardan Hovhannisyan
  • 1,101
  • 3
  • 17
  • 40
5
votes
3 answers

Removing nested comments bz lex

How should I do program in lex (or flex) for removing nested comments from text and print just the text which is not in comments? I should probably somehow recognize states when I am in comment and number of starting "tags" of block comment. Lets…
user1097772
  • 3,499
  • 15
  • 59
  • 95
5
votes
3 answers

What is the purpose of a lexer?

I was reading the answer to this question. I can't seem to find the answer to why someone would need a lexer separately Is it one of the steps a program goes through during compilation? Can someone please explain in simple terms why I would need a…
Anirudh Ramanathan
  • 46,179
  • 22
  • 132
  • 191
5
votes
5 answers

What is the lexical and syntactic analysis during the process of compiling in C Compiler?

What is the lexical and syntactic analysis during the process of compiling. Does the preprocessing happens after lexical and syntactic analysis ?
Raulp
  • 7,758
  • 20
  • 93
  • 155
4
votes
6 answers

Is the C++ compiler really smart enough to distinguish between multiply and dereference?

I have the following line of code: double *resultOfMultiplication = new double(*num1 * *num2); How does the compiler know which * is used for derefencing and which * is used for multiplication? Also, and probably a more important question is in…
Nosrettap
  • 10,940
  • 23
  • 85
  • 140
4
votes
2 answers

How to make a flex (lexical scanner) to read UTF-8 characters input?

It seems that flex doesn't support UTF-8 input. Whenever the scanner encounter a non-ASCII char, it stops scanning as if it was an EOF. Is there a way to force flex to eat my UTF-8 chars? I don't want it to actually match UTF-8 chars, just eat…
Martin Cote
  • 28,864
  • 15
  • 75
  • 99
4
votes
3 answers

Expression parsing: how to tokenize

I'm looking to tokenize Java/Javascript-like expressions in Javascript code. My input will be a string containing the expression, and the output needs to be an array of tokens. What's the best practice for doing something like this? Do I need to…
levik
  • 114,835
  • 27
  • 73
  • 90
4
votes
2 answers

Simple lexical analysis java program

My little project is a lexical analysis program in which i have to take every word found in an arbitrary .java file and list every line it appears on in the file. I need to have one look up table dedicated just to the reserved words and another for…
user1152918
  • 71
  • 1
  • 1
  • 3
4
votes
2 answers

java library to parse regular expressions into a syntax tree

I'd like a library that can take the string representation of a regexp and convert that into a syntax tree for easy programmatic manipulation. Something that would transform: (\s?)bla[a-z] into something like: PARENTHESIS CHAR:SPACE …
jp.
  • 106
  • 5
4
votes
3 answers

How to capture a string without quote characters

I'm trying to capture quoted strings without the quotes. I have this terminal %token STRING and this production constant: | QUOTE STRING QUOTE { String($2) } along with these lexer rules | '\'' { QUOTE } | [^ '\'']* { STRING…
Daniel
  • 47,404
  • 11
  • 101
  • 179
4
votes
1 answer

Profiling Regex Lexer

I've created a router in PHP which takes a DSL (based on the Rails 3 route) and converts it to Regex. It has optional segments (denoted by (nested) parenthesis). The following is the current lexing algorithm: private function…
efritz
  • 5,125
  • 4
  • 24
  • 33
4
votes
4 answers

Is this the job of the lexer?

Let's say I was lexing a ruby method definition: def print_greeting(greeting = "hi") end Is it the lexer's job to maintain state and emit relevant tokens, or should it be relatively dumb? Notice in the above example the greeting param has a…
ryeguy
  • 65,519
  • 58
  • 198
  • 260