I'm using difflib SequenceMatcher (ratio() method) to define similarity between text files. While difflib is relatively fast to compare a small set of text files e.g. 10 files of 70 kb on average comparing to each other (46 comparisons) takes about 80 seconds.
The issue here is that i have a collection of 3000 txt files (75 kb on average), a raw estimation on how much time SequenceMatcher needs to complete the comparison job is 80 days!
I tried "real_quick_ratio()" and "quick_ratio()" methods, but they don't fit to our needs.
Is there any way to speed up the comparison process? If not, is there any other faster method to do such a task? Even if it is not in Python.