Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions

votes

1 answer

Filter duplicates from list based on Levenshtein distance

Let's say that I have a list of JSONs like in the example. Of those that have duplicate title attribute (as determined by scoring over a certain threshold of Levenshtein distance), I'd like to filter out the duplicates that do not have the minimum…

asked Nov 30 '18 at 17:12

Chris C

votes

1 answer

Is this Levenshtein Distance algorithm correct?

I have written the algorithm below to compute the Levenshtein distance, and it seems to return the correct results based on my tests. The time complexity is O(n+m), and the space is O(1). All the existing algorithms I've seen only for this have…

c# algorithm distance levenshtein-distance

asked Nov 28 '18 at 01:55

dwally89

votes

1 answer

R - return n matches via levenshtein distance

I would like to find the n best matches to a given string via levenshtein distance. I know that the adist function in R gives the minimal distance, but I am attempting to scale the number of results to, say, 10. I have some code below. name <-…

r dataframe tm levenshtein-distance stringdist

asked Nov 02 '18 at 21:21

jvalenti

votes

2 answers

Getting the closest string match (with possibly very different string sizes)

I am looking for a way to find the closest string match between two strings that could eventually have a very different size. Let's say I have, on the one hand, a list of possible locations like: Yosemite National Park Yosemite Valley Yosemite…

algorithm language-agnostic string-comparison string-matching levenshtein-distance

asked Aug 24 '18 at 23:39

LBes

3,366
1
32
66

votes

1 answer

Is specific Lucene classes are intended to be consumed by applications?

I'm new to the Apache Lucene library. I'd like to directly consume a class in this library called: LevenshteinDistance to calculated similarity search between strings. Would that be correct for my own application to directly consume it, or should I…

lucene levenshtein-distance

asked Aug 23 '18 at 14:38

aQ123

votes

1 answer

Why Levenshtein distance from "abcd" to "badc" is 3?

Starting from "abcd", I want to go to "badc" : I remove "a" -> "bcd"; I insert "a" at the right position -> "bacd"; I remove "c" -> "bad"; I insert "c" at the right position -> "badc". It's 4 operations. I can't find out a shorter way to do it.…

levenshtein-distance

asked Aug 09 '18 at 10:20

jules

votes

0 answers

Scikit Learn for clustering mixed data (numeric & categorical)

Can someone please help modify the working example below to create clusters from the shared data? The example uses Mean Shift clustering from Scikit-Learn to identify patches of similar/co-located plant species in an agronomical facility. Similar…

python machine-learning scikit-learn levenshtein-distance mean-shift

asked Jul 03 '18 at 19:34

Kdog

votes

0 answers

Is there a way to filter what the python-levenshtein extension changes?

Ive got a large list of names (strings) that I have to check against each other to see if there are any typos. To do this I've been using the pypi python-Levenshtein extension against the iterated list, with a typo being considered as a comparison…

python levenshtein-distance

asked Jun 13 '18 at 00:02

AngusOld

votes

1 answer

Creating Multiple Distance Matrix using Levensthein Algorithm on C++

I am developing a programme about string analyses. I've done levensthein distance algorithm but it is just applicable for two strings. Now, I want to run more strings than two, concurrently. Like, from text file; S1 : "0123412315", S2 :…

c++ algorithm levenshtein-distance

asked May 24 '18 at 20:00

Ugur Ç.

votes

1 answer

php find correct closest word

as we khow, we can find closest words by levenshtein for example:

php levenshtein-distance

asked May 23 '18 at 08:55

DolDurma

15,753
51
198
377

votes

3 answers

fuzzywuzzy to normalize string in pandas column

I have a dataframe like this now i want to normalize the string in the 'comments' column for the word 'election' . I tried using fuzzywuzzy but wasn't able to implement it on pandas dataframe to partially match the word 'election'. The output…

python pandas dataframe levenshtein-distance fuzzywuzzy

asked Apr 26 '18 at 13:12

nOObda

votes

1 answer

string matching irrespective of order of words and short forms in R : fuzzy string matching in R

I am new to R, and want to compare 2 strings(addresses) where Word order could be different, other than numbers. (Consecutive numbers need to be in same order) Words could be at times in short form, eg street could be st., North West could be…

r regex string string-matching levenshtein-distance

asked Apr 18 '18 at 03:32

Aditya Kuls

votes

1 answer

Javascript Version of (MDLD) Modified Damerau-Levenshtein Distance Algorithm

I was looking to test the performance of MDLD for some in-browser string comparisions to be integrated into a web-app. The use-case involves comparing strings like, "300mm, Packed Wall" and "Packed Wall - 300mm", so I was looking for fuzzy string…

javascript stored-procedures levenshtein-distance

asked Apr 17 '18 at 06:35

Lovethenakedgun

votes

0 answers

Best algorithm suited for finding most similar town

I have a dataset full of towns and their geographical data and some input data. This input data is, almost always, a town as well. But it being towns scraped off of the internet, they can be a little bit misspelled or spelled differently. e.g. Saint…

string-matching similarity ranking levenshtein-distance fuzzy-search

asked Apr 05 '18 at 09:09

Adriaan Vermeire

votes

1 answer

match two datasets with record linkage in R

I am trying to match two datasets in R: datasetA and datasetB. These datasets contain the following columns. datasetA ID: 15 Name: peter sanders First_Name: peter Last_Name: sanders ORG_NAME:coffee&cake City: New York Amount(USD): 10369…

r compare matching levenshtein-distance record-linkage

asked Mar 15 '18 at 13:44

Alice M

Prev 1 2 3

…

64 65 Next