I need to write a lexer for a java source code plagiarism detector. Here is an example what I want to achieve.
//Java code Tokens:
public class Count { Begin Class
public static void main(String[] args) Var Def, Begin Method
throws java.io.IOException {
int count = 0; Var Def, Assign
while (System.in.read() != -1) Apply, Begin While
count++; Assign, End While
System.out.println(count+" chars."); Apply
} End Method
} End Class
I think Jflex is the right tool to generate the lexer. However after looking through some examples. I cannot find a way to distinguish class brackets and method brackets. Most tokenizers I find just recognize them as same token. Also how do I distinguish a method apply from a variable identifier?