Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions

votes

3 answers

how to recognize a set of key words in a text

I have a huge set of key words. Given a text , I want to be able to recognize only those words that occur in the key list of words and ignore all the other words. What is the best way to approach this?

algorithm lexical-analysis

asked May 20 '11 at 16:04

kc3

4,281
7
20
16

votes

2 answers

Is the dot of dot notation an operator or something else ? How do you know?

I am trying to classify the "dot" token used in the dot notation (object.property). Being a self-taught amateur developper, mainly using JavaScript, I have a simplified (and certainly imperfect) understanding of programming and JavaScript. When…

javascript token semantics lexical-analysis formal-languages

asked Jun 25 '19 at 12:48

mel

votes

2 answers

Where can I find the full syntax of C that is necessary to implement a compiler?

My aim is not to write a C compiler, however I do require the full syntax of the C programming language. This will allow me to write program(s) to format, manage, and analyze C programs and libraries more easily. To achieve that, I have no option…

c parsing syntax implementation lexical-analysis

asked Jan 26 '19 at 23:23

machine_1

4,266
2
21
42

votes

2 answers

How can lexing efficiency be improved?

In parsing a large 3 gigabyte file with DCG, efficiency is of importance. The current version of my lexer is using mostly the or predicate ;/2 but I read that indexing can help. Indexing is a technique used to quickly select candidate clauses of a …

performance prolog tokenize lexical-analysis

asked Jan 18 '19 at 18:35

Guy Coder

24,501
8
71
136

votes

1 answer

How to properly scan for identifiers using Ragel

I'm trying to write a scanner for my C/C++/C#/Java/D-like programming language that I'm designing for personal reasons. For this task I'm using Ragel to generate my scanner. I'm having trouble understanding exactly when a lot of the operators…

lexical-analysis ragel

asked Mar 06 '11 at 16:21

Sion Sheevok

4,057
2
21
37

votes

1 answer

Get Prolog DCG arguments generated from sentence being parsed

I'm putting together a lexer/parser for a simple programming language using a Prolog DCG that builds up the list of tokens/syntax tree using DCG arguments, e.g. symbol(semicolon) --> ";". symbol(if) --> "if". and then the syntax tree is built using…

parsing prolog lexical-analysis dcg

asked Nov 07 '17 at 15:31

user2396812

votes

1 answer

Function of the various Lexer commands in ANTLR4. Is my interpretation correct? What do each of them do?

I have starting learning to write a lexer in ANTLR 4.5. From this page, which serves as documentation, I see that the following Lexer commands exist : more, pushMode(x), popMode, type(x), channel(x), mode(x), skip. I have not been able to clearly…

command antlr antlr4 lexer lexical-analysis

asked Apr 17 '17 at 20:04

GoodDeeds

7,956
5
34
61

votes

1 answer

Are there some tools to check if a fortran procedure modifies its argument?

Are there tools that can be used to check which arguments of a fortran procedure is being defined or not inside the procedure? I mean something like a lexical analyzer that simply check if a variable is being used on the left hand side of an…

function parsing fortran arguments lexical-analysis

asked Aug 11 '16 at 21:06

innoSPG

4,588
1
29
42

votes

1 answer

Does PLY's lexer support "maximal munch"?

The syntax of many programming languages requires that they be tokenized according to the "maximal munch" principle. That is, that tokens be built from the maximum possible number of characters from the input stream. PLY's lexer does not seem to…

python regex lexical-analysis ply

asked Mar 13 '16 at 22:02

user200783

13,722
12
69
135

votes

5 answers

Find the Range of the Nth word in a String

What I want is something like "word1 word2 word3".rangeOfWord(2) => 6 to 10 The result could come as a Range or a tuple or whatever. I'd rather not do the brute force of iterating over the characters and using a state machine. Why reinvent the…

swift string lexical-analysis

asked Dec 23 '15 at 22:12

Andrew Duncan

3,553
4
28
55

votes

2 answers

Recognize Identifiers in Chinese characters by using Lex/Yacc

How can I use Lex/Yacc to recognize identifiers in Chinese characters?

lex lexical-analysis

asked Jun 28 '10 at 13:31

WuFa

votes

3 answers

Regular expressions versus lexical analyzers in Haskell

I'm getting started with Haskell and I'm trying to use the Alex tool to create regular expressions and I'm a little bit lost; my first inconvenience was the compile part. How I have to do to compile a file with Alex?. Then, I think that I have to…

regex haskell lexical-analysis alex

asked Jun 21 '10 at 22:26

Anny

votes

2 answers

Regular expression for HTML tags

I am working on Lexical Analyzer. I have an HTML file. I want to convert every letter in the file expect whatever written within an HTML tag into CAPITAL letter. Example: StackOverFlow This will be…

html regex lex lexical-analysis

asked May 05 '15 at 21:11

Surajeet Bharati

1,363
1
18
36

votes

3 answers

Including an external header file in Flex

I am writing a program using flex that takes input from a text file and splits them into some tokens like identifier, keywords, operators etc. My file name is test.l. I have made another hash table program which includes a file named SymbolTable.h .…

c++ flex-lexer lexical-analysis

asked Apr 18 '15 at 13:46

SKB

votes

3 answers

ANTLR4: lexer rule for: Any string as long as it doesn't contain these two side-by-side characters?

Is there any way to express this in ANTLR4: Any string as long as it doesn't contain the asterisk immediately followed by a forward slash? This doesn't work: (~'*/')* as ANTRL throws this error: multi-character literals are not allowed in lexer…

antlr grammar antlr4 lexer lexical-analysis

asked Apr 16 '15 at 09:06

Roger Costello

3,007
1
22
43

Prev 1 2 3

…

56 57 Next