performance issue, edit distance for large strings LCP vs Levenshtein vs SIFT

Question

So I'm trying to calculate the distance between two large strings (about 20-100). The obstacle is the performance, I need to run 20k distance comparisons. (It takes hours)

After investigating, I came a cross few algorithms, And I'm having trouble to decide which to choose. (based on performance VS accuracy)

https://github.com/tdebatty/java-string-similarity - performance list for each of the algorithms.

** EDITED **

Is SIFT4 algorithm well-proven / reliable?
Is SIFT4 the right algorithm for the task?
How come it's so much faster than LCP-based / Levenshtein algorithm?
Is SIFT also used in image processing? or is it a different thing? answered by AMH

Thanks.

Alireza · Answer 1 · 2017-06-06T13:47:54.037

2

As far as i know Scale-invariant feature transform (SIFT) is an algorithm in computer vision detect and describe local features in images.

also if you want to find similar images you must compare local features of images to each other by calculating their distance which may do what you intend to do. but local features are vector of numbers as i remember. it uses Brute-Force matcher:Feature Matching - OpenCV Library - SIFT

please read about SIFT here: http://docs.opencv.org/3.1.0/da/df5/tutorial_py_sift_intro.html

SIFT4 which is mentioned on your provided link is completely different thing.

edited Jun 06 '17 at 13:47

answered Jun 05 '17 at 18:19

Alireza

2,319
2
23
35

I cant select this as an answer. Since I was not clear, the answer you provided is not what I was intended to ask. Also I think your answer is valuable for this thread so I voted UP. Thank you. – Adi Darachi Jun 06 '17 at 08:45

performance issue, edit distance for large strings LCP vs Levenshtein vs SIFT

1 Answers1