Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions
11
votes
7 answers

Parsing Meaning from Text

I realize this is a broad topic, but I'm looking for a good primer on parsing meaning from text, ideally in Python. As an example of what I'm looking to do, if a user makes a blog post like: "Manny Ramirez makes his return for the Dodgers today…
Tom
  • 22,301
  • 5
  • 63
  • 96
10
votes
5 answers

What Javascript constructs does JsLex incorrectly lex?

JsLex is a Javascript lexer I've written in Python. It does a good job for a day's work (or so), but I'm sure there are cases it gets wrong. In particular, it doesn't understand anything about semicolon insertion, and there are probably ways…
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
10
votes
1 answer

How do i implement If statement in Flex/bison

I dont get the error, please can you help me out, here is the .l and .y file.thanks. %{ #include "ifanw.tab.h" extern int yylval; %} %% "=" { return EQ; } "!=" { return NE; } "<" { return LT; } "<=" { return LE; } ">" { return…
Imran
  • 157
  • 1
  • 2
  • 9
10
votes
3 answers

How to use yylval with strings in yacc

I want to pass the actual string of a token. If I have a token called ID, then I want my yacc file to actually know what ID is called. I thing I have to pass a string using yylval to the yacc file from the flex file. How do I do that?
neuromancer
  • 53,769
  • 78
  • 166
  • 223
10
votes
1 answer

DFAs vs Regexes when implementing a lexical analyzer?

(I'm just learning how to write a compiler, so please correct me if I make any incorrect claims) Why would anyone still implement DFAs in code (goto statements, table-driven implementations) when they can simply use regular expressions? As far as I…
Marco Petersen
  • 303
  • 3
  • 13
10
votes
3 answers

Writing a Z80 assembler - lexing ASM and building a parse tree using composition?

I'm very new to the concept of writing an assembler and even after reading a great deal of material, I'm still having difficulties wrapping my head around a couple of concepts. What is the process to actually break up a source file into tokens? I…
Gary Paluk
  • 1,038
  • 1
  • 14
  • 28
9
votes
1 answer

What does `InputElementDiv` stand for in ECMAScript lexical grammar

The lexical grammar of ECMAScript lists the following token classes for lexical analyzer (lexer): InputElementDiv:: WhiteSpace LineTerminator Comment CommonToken DivPunctuator RightBracePunctuator InputElementRegExp:: …
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
9
votes
1 answer

What is the meaning of yytext[0]?

What is the meaning of yytext[0]? And why should we use in the lex and yacc program? I'm learner so don't mind if it is a silly question.
sandy
  • 149
  • 1
  • 2
  • 8
9
votes
1 answer

Writing re-entrant lexer with Flex

I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue): reentrant.l: /* Definitions */ digit [0-9] letter …
Viet
  • 17,944
  • 33
  • 103
  • 135
9
votes
2 answers

Regular expressions - Matching whitespace

I am having a big problem to write a regexp that will trim all the whitespace in my input. I have tried \s+ and [ \t\t\r]+ but that don't work. I need this because I am writing a scanner using flex, and I am stuck at matching whitespace. The…
mrjasmin
  • 1,230
  • 6
  • 21
  • 37
9
votes
2 answers

Writing a transpiler to the point where the actual mapping takes place

I want to understand how a transpiler works. The best to do this is to write one ofcourse. I've been looking into a few resources to understand how this works, theoretically. And i understand the following: From what i understand i basically need to…
w00
  • 26,172
  • 30
  • 101
  • 147
8
votes
7 answers

How do I write a parser in C or Objective-C without a parser generator?

I am trying to make a calculator in C or Objective-C that accepts a string along the lines of 8/2+4(3*9)^2 and returns the answer 2920. I would prefer not to use a generator like Lex or Yacc, so I want to code it from the ground up. How should I go…
22222222
  • 581
  • 6
  • 10
8
votes
6 answers

What can create a lexical error in C?

Besides not closing a comment /*..., what constitutes a lexical error in C?
DrBeco
  • 11,237
  • 9
  • 59
  • 76
8
votes
1 answer

Parsing structured text in Ruby

There are several questions on SO about parsing structured text in Ruby, but none of them apply to my case. I'm the author of the Ruby Whois library. The library includes several parsers to parse a WHOIS response and extract the properties from the…
Simone Carletti
  • 173,507
  • 49
  • 363
  • 364
8
votes
1 answer

ANTLR4: Any difference between import and tokenVocab?

The import statement or the tokenVocab option can be put in a parser grammar to reuse a lexer grammar. Sam Harwell advises to always use tokenVocab rather than import [1]. Is there any difference between import and tokenVocab? If there's no…
Roger Costello
  • 3,007
  • 1
  • 22
  • 43
1 2
3
56 57