1

Possible Duplicate:
How to create a plagiarism detector of c++ files
a simple lexer.cpp to convert a simle c++ file to a sequence of tokens

hi i have a project "plagiarism detector for c++ files"

and a need to know how to convert a c++ file to sequence of tokens like that :

int factorial(int n) {
if (n == 0) return 1 ;
else return n * factorial(n-1) ;
}

into that :

Int, factorial, (, int, n, ), {, if, (, n, ==, 0, ), return, 1, ;, else, return, n, *, factorial, (, n, -, 1, ), ;, }
Community
  • 1
  • 1
moradpro
  • 17
  • 1
  • 3

1 Answers1

2

One typically writes tokenizers using a lexer generator like Flex or using the lexer part of a parser generator like ANTLR. A lexer and parser of the C++ grammar, written lex and yacc are available.

These lexers boil down (to an extent) to a lot of regular expressions, and some code for switching between modes (e.g. string mode, comment mode, and language mode).

Lesmana
  • 25,663
  • 9
  • 82
  • 87
Ken Bloom
  • 57,498
  • 14
  • 111
  • 168