Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
2 answers

Levinshtein Distance of two words from text file with Python

I have a small 30 line text file with two similar words on each line. I need to calculate the levenshtein distance between the two words on each line. I also need to use a memoize function while calculating the distance. I am pretty new to Python…
Ty Bailey
  • 2,392
  • 11
  • 46
  • 79
0
votes
1 answer

How do I compare 1 word with many and output a list of levenstien scores

I have a form where I can input two words then compare the levenshtein score, that works fine. I want to be able to compare 1 word with a string of words delimited by ", ". The whole lot then needs to echo out. Here's what I have so far: Levenstien…
user1721230
  • 317
  • 1
  • 6
  • 19
0
votes
1 answer

using levenshtein distance ratio to compare 2 records

I've created the mysql user function using the levenshtein distance and ratio source codes. I am comparing 2 records and based on a 75% match I want to select the record. Order comes into table paypal_ipn_orders with an ITEM title A query executes…
user1542036
  • 423
  • 2
  • 5
  • 9
0
votes
1 answer

Comparing 2 strings to find if they contain the same words with java

I am using Levenshtein distance which is a string metric for measuring the amount of difference between two sequences to find the percent of difference between two strings. I want to use a better method to declare the strings are similar using words…
PrettyGirl
  • 31
  • 1
  • 2
0
votes
1 answer

Any known javascript/php dictionaries like 'word1', 'word2'?

Just recently I was looking up about Levenshtein algorithm and after searching for an hour I couldn't find a javascript file like: var dictionary = [ 'coke', 'cokeman', 'cokeney' ] Is there a faster way to do this? I…
keji
  • 5,947
  • 3
  • 31
  • 47
0
votes
1 answer

levenshtein distance with items in list in python

I have two list, below, and i want to compare if words that are similar levenshtein distance of less than 2. I have a function to find the levenshtein distance, however as parameters it needs the two words. I can find which words are not in the…
jacobLoz
  • 13
  • 1
  • 6
0
votes
4 answers

mySQL showing as array

Im trying to get this code to work and for the life of me can not get it going... I want a search that shows a Did you mean. with the code i have all i get it "Did you mean: Array l:6" what is wrong with what i have here? $my_word =…
David Morin
  • 485
  • 3
  • 5
  • 16
0
votes
2 answers

Is there any modified Minimum Edit Distance (Levenshteina Distance ) for incomplete strings?

I've sequences builded from 0's and 1's. I want to somehow measure their distance from target string. But target string is incomplete. Example of data I have, where x is target string, where [0] means the occurance of at least one '0' : x…
Qbik
  • 5,885
  • 14
  • 62
  • 93
-1
votes
1 answer

Can't install pandas-dedupe on Windows Python 3.9

Running pip install pandas-dedupe, I get the following error: I tried manually installing python-Levenshtein first and got the same problem with the addition . What can I do?
Corram
  • 233
  • 1
  • 3
  • 13
-1
votes
1 answer

Speeding up fuzzy match on large list

I am working on a project that uses fuzzy logic on a list of names that could go about 100,000 unique records. On the recent screening that we have conducted, the functions that we use can complete a single name within 2.20 seconds on average. This…
jsv
  • 105
  • 1
  • 8
-1
votes
2 answers

Replace values ​in a column with similar values ​in another column with different size - Python

I have a dataframe with different values ​​in a column (about 6,000 rows), which I need to replace with similar (but differents) values ​​found in another dataframe, which has fewer rows. Store Values to replace Store A 05/15/21 Store…
Eduardo
  • 3
  • 3
-1
votes
1 answer

Delete "almost duplicates" rows of string based on fuzzy matching with a lot of lines (>50 000)

I have 50 000 words like : add to add chicken a chicken eat the chicken to eat ... And i want to drop the line which have a high fuzzy similarity with other lines. Then the output should be: add to eat chicken ... I can't calculate every fuzzy…
-1
votes
1 answer

Similarity between lists of floats

I have a list of floats that I want to compare to other lists and get the similarity ratio in python : The list that I want to compare: [0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002,…
Elyes Lounissi
  • 405
  • 3
  • 12
-1
votes
2 answers

SQL Left Fuzzy Join with Levenshtein Distance

I have two data sets from two different systems being merged together within SQL, however, there is a slight difference within the naming conventions on the two systems. The change in convention is not consistent across the larger data sample but…
tg00222
  • 3
  • 3
-1
votes
2 answers

Is Levenshtein distance algorithm performs better than Needleman Wunsch Algorithm?

I know that both Levenshtein and Needleman Wunsch has the time complexity of O(N*M) but I was curious to know which one performs better than the other and why?