I am working on an implementation of a source code plagiarism algorithm(winnowing algorithm) and have a problem where I need some help.
Example: I have a string
String test="blahello,,,,/blatestbla7234///§"§$%"%$\n\n23344)§()(§$blablayeahbla";
and transform this String to
test="blahelloblatestblablablayeahbla"
and from this string I build kgrams for example 5-grams
blahe lahel ahell hello ellob llobl .... ahbla
I save the kgrams in a list of strings but would also like to save the start and end position from the original text of every kgram, so I can reference in the end every kgram back to their original text position.
EDIT:
So my question would be how can I get the start and end position of a kgram Can anyone help me there? Do you have any idea? Thanks in advance.