I'm very impressed to how plagiarism checkers (such as Turnitin website ) works. But how do they do that ? In a very effective way, I'm new to this area thus is there any word matching algorithm or anything that is similar to that is used for detecting alike sentences?
Thank you very much.
Asked
Active
Viewed 1,036 times
-5

Xitrum
- 7,765
- 26
- 90
- 126
-
3[WikiPedia](http://en.wikipedia.org/wiki/Plagiarism_detection) didnt help? – Till Nov 26 '13 at 23:27
1 Answers
2
I'm sure many real-world plagiarism detection systems use more sophisticated schemes, but the general class of problem of detecting how far apart two things are is called the edit distance. That link includes links to many common algorithms used for this purpose. The gist is effectively answering the question "How many edits must I perform to turn one input into the other?". The challenge for real-world systems is performing this across a large corpus in an efficient manner. A related problem is the longest common subsequence, which might also be useful for such schemes to identify passages that are copied verbatim.

Gian
- 13,735
- 44
- 51