Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
-1
votes
1 answer

Levenshtein distance algorithm on Spark

I'm starting with Hadoop ecosystem and I'm facing some questions and need your help. I have two HDFS files and need to execute Levenshtein distance between a group of columns of the first one versus another group of the second one. This process will…
-1
votes
2 answers

How do I get rid of attribute Error when running fuzzywuzzy?

I'm trying to compare 2 lists and get a distance ratio for each item on the list. My code below returned an attribute error: 'Series' object has no attribute 'fuzz'. How do i fix this? 'differences' is a result from my earlier code for a list of…
-1
votes
1 answer

get common strings between 2 fields (Levenstein ?)

I need to get the common string between 2 fields: Could I do it in sql (postrgre) ? PS :It's an hypothetical pourcentage that's gives me the similarity of fields s2 and s2 Thank you in advance,
betty
  • 59
  • 9
-1
votes
1 answer

levenshtein ALWAYS infinite loop recursive C

Liechtenstein in c programming always return infinite loop this is my code i try many solution and i try to stock variables and use pointers but always i have the infinite loop i think it's because the 3 recursive calls but in the doc of …
zratan
  • 856
  • 8
  • 12
-1
votes
1 answer

Clustering in R levenshtein distance

I am trying to use kmeans clustering using the levenshtein distance. I am having hard time in interpeting the results. # courtesy: code is borrowed from the other thread listed below with some additions of k-means clustering set.seed(1) …
-1
votes
1 answer

How can I find string variations in dictionary within a distance of 1?

Say you have scanned a document with names on it. Due to mistakes in the scanning process, you want to look up the names in a dictionary. Therefore, you need a function that takes in a possible name and outputs a list with every possible string…
J. Jones
  • 19
  • 4
-1
votes
1 answer

Python variable in brackets and range

What is that mean ([i]+[0]*n) and why is i and 0 in the brackets?? previous, current = current, [i]+[0]*n And why can't I print current value in the next line? Like so: previous, current = current, [i]+[0]*n print(current) I have an error:…
Guruku
  • 653
  • 1
  • 7
  • 6
-1
votes
3 answers

Python: Grouping Similar text from a list

how can i group values from an array with fuzzy logic matching 80% combined_list = ['magic', 'simple power', 'matrix', 'simple aa', 'madness', 'magics', 'mgcsa', 'simple pws', 'seek', 'dour', 'softy'] yields: ['magic, magics'], ['simple pws',…
Led
  • 662
  • 1
  • 19
  • 41
-1
votes
1 answer

Improve performance while using levenshtein and soundex algorithms on search in MySQL

We are trying to upload data from Excel to Database. Before uploading, we would like to preview the data with the count of Match status(Eg: No match, Similar match, Exact match) while comparing with our database. The below query is taking 3 minutes…
-1
votes
1 answer

HTML pages comparison - Levenshtein distance

My task is to compare two html pages' content like how much they are different from each other. By difference I mean that how much both are different/identical w.r.t. divs, imgs, content, and other tags (all differences a user can visually…
Junaid
  • 2,572
  • 6
  • 41
  • 77
-1
votes
1 answer

Need help speeding up a calculation

I wrote a program that generates a md5 hash onto a printed out bill. I want to be able to check the hash against a generated list of hashes. I then use a Levenshtein distance function to figure out which hash has the lowest edit distance from the…
mawnch
  • 385
  • 2
  • 4
  • 13
-1
votes
1 answer

R - C++ code reproduction in Rcpp Math.Min issue

I am trying to reproduce a C++ code that I found here about the LevenshteinDistance. More precisely, I am trying to reproduce the part starting by static int LevenshteinDistance(string s, string t) until return d[n, m]; } However, I am…
giac
  • 4,261
  • 5
  • 30
  • 59
-1
votes
1 answer

Levenshtein Distance for a List

I want to divide my word list into some number of clusters using Levenshtein Distance. data = pd.read_csv("data.csv") Target_Column = data["words"] Target = Target_Column.tolist() clusters = defaultdict(list) threshold =5 numb =…
Ajay Jadhav
  • 161
  • 1
  • 1
  • 5
-1
votes
1 answer

Calculate levenshteinDist between rownames and colnames using mapply

I want to calculate levenshteinDist distance between the rownames and colnames of a matrix using mapply function: Because the volume of may matrix is too big and using a nested loop "for" take a very long time to give me the result. Here's the old…
Sarah
  • 3
  • 2
-1
votes
1 answer

Set edit limit in python Levenshtein

I have millions of words in list A and about 100 in list B. I would like to find all the items in set A that look like items in set B. I'm using the Python Levenshtein library, which is written in C, and it works quite well. But 99% of comparisons…