how will i simulate the damerau leveshtein distance algorithm so as to detect plagiarism in documents? thanks!
-
2Google and ask here pure technical questions. I assume nobody will answer you something like this. – Andrejs Cainikovs Oct 13 '09 at 06:39
-
The Wikipedia article should help you get started. – Henning Oct 13 '09 at 06:40
1 Answers
Levenshtein distance is primarily used to compare two strings, such as comparing names or finding alternates in a spell checker. Using this algorithm for a whole document to detect plagiarism is not typical.
There is some work in the area though. Everything points to this article, which requires subscription:
Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm
http://www.computer.org/portal/web/csdl/doi/10.1109/ICICIC.2008.422
Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified SmithWaterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison

- 73,278
- 17
- 138
- 182