I am trying to create a lexical analyzer program using java.Program must have the concept of tokenization .I have beginner level knowledge in compiler programming.I know there are lot of Lexical generators exist on internet.I can use them to test my own lexical analyzer out put .But i need to do my own lexical analyzer .Can any one please give some best references or articles or ideas to start my cording ?
Asked
Active
Viewed 1,854 times
0
-
1is there anything in particular that you are having trouble with? or just the general idea. – John Kane Jul 22 '12 at 12:12
-
I have trouble in finding out kewords,commennts,WhiteSpace etc....But i think i can do them easily if i get general idea. – Thabo Jul 22 '12 at 12:21
-
1Do you want to build a lexical analysis tool to recognize the tokens of a particular language? Or do you want to build a lexer-generator tool that accepts descriptions of tokens and produces a lexical analysis tool? – Ira Baxter Jul 22 '12 at 16:15
-
@Ira Baxter Actually i want to build a lexical analysis tool to recognize the tokens of java source code. – Thabo Jul 22 '12 at 16:33
2 Answers
3
"Compilers Principles, Techniques and Tools" by Aho Sethi and Ullman has a chapter on lexical analysers. It includes a lot of the theory on regular expressions and finite automata that are core to this problem domain.

Stephen C
- 698,415
- 94
- 811
- 1,216
1
I would try taking a look at the source code for some of the better ones out there. I have used Sablecc in the past. If you go to this page describing how to to set you your environment, there is a link to the source code for it. Antlr is also a really commonly used one. Here is the source code for it.
Also, The Dragon Book is really good.
As Suggested by SK-logic I am adding Modern Compiler Implementation as another option.

John Kane
- 4,383
- 1
- 24
- 42
-
-
Why Dragon book has 2nd edition....Is it out dated?I think the book mentioned by Stephen C's answers is the same book. – Thabo Jul 23 '12 at 09:53
-
@SK-logic, do you have any other suggestions? That is the book that I worked with when I took a course on compilers not that long ago, and I found it to be decent. – John Kane Jul 23 '12 at 14:01
-
@Thabo yeah, I think that you are right. I just have never heard anyone call it by its actual name. – John Kane Jul 23 '12 at 14:02
-
@JohnKane, there's a lot of papers on GLR, PEGs and all that. Dragon Book promotes techniques that are totally irrelevant nowadays. – SK-logic Jul 23 '12 at 14:17
-
@SK-logic have you read the updated version that came out in 2006? They added sections on JIT compiling, garbage collection, Parallelism, dynamic compilation ... This book has been (and I believe still basically is) one of (if not the defacto book) for compiler theory. It is really thorough and gives a lot of detail about what issues there are to deal with. Yes, there are current research papers describing new approaches. However, they are most likely not going to help someone who stated that they have a "beginner level knowledge in compiler programming". – John Kane Jul 23 '12 at 16:06
-
@JohnKane, the book improved a little bit, but still - no SSA or CPS, no sea of nodes, no Packrat parsing, none of the modern register allocation algorithms. It blurs what is really important and goes into deep details of the irrelevant. Parsing in general is pretty much irrelevant, and it is still occupying more than a half of a book. And all the parsing algorithms there are outdated. I would never recommend this book for a beginner. Appel is much better in terms of both a learning curve and being up to date. – SK-logic Jul 23 '12 at 16:15
-
@SK-logic I edited my answer and added that as another resource. I would not really consider parsing irrelevant (especially since the question was specifically asking about it). There are things not covered in this book that others do cover, but they stress the issues that need to be addressed in a language neutral way. They have stated that they wrote it like this with the belief that most people reading it will not write a full compiler and instead focus on the issues at hand. I tend to see things that way as well. – John Kane Jul 23 '12 at 16:53
-
@JohnKane, the worst thing about the Dragon Book is that everyone who read it would think that compilers are complex and tricky, whereas in fact compilation is one of the simplest things out there. And their disproportional emphasis on parsing is clearly harmful. Writing compilers is really, really easy - unless you read that horrible book. – SK-logic Jul 23 '12 at 17:01
-
@SK-logic That is your opinion. I have read it (along with many other people) who have found it to be very useful. I have also seen many people review Modern Compiler Implementation with the same complaints that you are giving along with complaining about horrible examples. Also They talk about lexical and syntax analysis for two chapters out of 12-13. I am not sure how that is disproportional or harmful. I think we have both made our points though and this doesn't need to continue. – John Kane Jul 23 '12 at 17:16