Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions

votes

1 answer

Find minimum Levenshtein Distance between one word and an array of thousands

So my users wrote their addresses in a registration form, but a lot of them have typos. I have another list retrieved from the city records with the correct spelling of those addresses. So let's say I have "Brooklny" typed by them and I have the…

asked Jan 29 '16 at 03:04

Rodolfo Oocampo

votes

1 answer

Why is the Levenshtein distance score so low for these two strings?

I am using a Levenshtein distance algorithm to find similar strings and I currently have my score for acceptance as 12 (because some of my strings have up to 5 words). But I was suprised to see the below two strings get a score of 11, they seem…

java string groovy levenshtein-distance fuzzy-comparison

asked Jan 23 '16 at 02:29

AbuMariam

3,282
13
49
82

votes

0 answers

Levenshtein distance with multiple comparisons

Currently I am trying to create a "best match" query. I came across this answer, but the main difference is that I have a table with more columns, and I need to compare 6 strings. Is there a way to implement the Levenshtein distance algorithm with a…

sql sql-server-2008 string-comparison levenshtein-distance

asked Jan 12 '16 at 20:13

terbubbs

1,512
2
25
48

votes

1 answer

php levenstein similarity on combobox with id

I tried to make a coincidence in a combobox with a list of categories with the first similar word that find on a register, for example: Input a quote: "the sun is great and nobody can see it directly" explode each word in an array "the", "sun",…

php combobox words levenshtein-distance

asked Dec 21 '15 at 19:02

CristJian Cordero Loor

votes

2 answers

Speeding up Levenshtein distance calculation in Ionic app

What I'm doing: I'm developing a mobile dictionary app for a number of languages How I'm doing it: Using ionic framework with combination of some angular and some pure js (imported from a working online dictionary site of the same languages) The…

javascript performance ionic-framework nlp levenshtein-distance

asked Dec 16 '15 at 19:01

A. Pine

votes

1 answer

closest string match for comparing OCR results

I'm OCRing few sample images. I have manually read and stored text contained in these images in a separate text file. I'm looking to test my OCR success rate. So, I'm looking for an algorithm that would tell me the a success percentage when…

python algorithm language-agnostic string-comparison levenshtein-distance

asked Oct 12 '15 at 02:51

Anthony

33,838
42
169
278

votes

1 answer

How to effeciently find all fuzzy matches between a set of terms and a list of sentences?

I have a list of sentences (e.g. "This is an example sentence") and a glossary of terms (e.g. "sentence", "example sentence") and need to find all the terms that match the sentence with a cutoff on some Levenshtein ratio. How can I do it fast…

python full-text-search levenshtein-distance fuzzy-comparison

asked Sep 08 '15 at 17:12

x3al

votes

2 answers

Adding exceptions to Levenshtein-Distance-like algorithm

I'm trying to compute how similar a sequence of up to 6 variables are. Currently I'm using a Collections Counter to return the frequency of different variables as my edit-distance. By default, the distance in editing a variable (add/sub/change) is…

python algorithm counter levenshtein-distance edit-distance

asked Sep 08 '15 at 13:31

Luis

votes

2 answers

LevenshteinDistance Method does not provide the most accurate result

I have a file with an "X" number of names, i need to match each of those names against another file and see if said name is amongst them, but written in a different way ("Verizon" -> "Verizon LTD"). I was doing this with a the "Fuzzy Lookup" interop…

c# levenshtein-distance fuzzy-search

asked Aug 07 '15 at 13:20

Patrick

votes

0 answers

MySQL Similar values in VARCHAR column

mysql node.js levenshtein-distance sentence-similarity

asked Jul 07 '15 at 17:12

Ananth

4,227
2
20
26

votes

1 answer

perl custom sort by string similarity clustering

In Perl, I would like to sort a collection of different length strings in a way that automatically lumps together similar strings. Intuitively, I imagine I need some distance measure for each pair and then a clustering routine that groups by the…

perl sorting levenshtein-distance

asked Jun 22 '15 at 15:44

719016

9,922
20
85
158

votes

1 answer

R - stringdist cost setting error

I have an error when I try to set the operations costs in stringdist Any ideas why ? library(stringdist) seq = rbind( c('aaa'), c('aba'), c('aab'), c('ccc') ) This works perfectly (Levensthein distance) stringdistmatrix(a = seq, b…

r string-matching levenshtein-distance stringdist

asked Jun 20 '15 at 15:01

giac

4,261
5
30
59

votes

2 answers

How to get most important occurrences from an array?

First of all, this is not a language specific question, the below example uses PHP but it's more about the method (regex?) to find the answer. Let's say I have an array: $array = ['The Bert and Ernie game', 'The Bert & Ernie game', 'Bert and Ernie…

regex levenshtein-distance

asked Jun 17 '15 at 09:34

Bob van Luijt

7,153
12
58
101

votes

1 answer

What indexer do I use to find the list in the collection that is most similar to my list?

Lets say I have my list of ingredients: {'potato','rice','carrot','corn'} and I want to return lists from a database that are most similar to mine:…

search indexing solr levenshtein-distance

asked Jun 12 '15 at 09:26

JaseC

3,103
2
21
22

votes

0 answers

How do we ignore the order of letters in calculating Levenshtein distance?

This question is not new and i have seen some form of explanation here and here. Both methods described performing N grams (bigrams mostly) calculations on the terms of query 1 and query 2 and then finding the cosine similarity. I was hoping for a…

python levenshtein-distance tf-idf cosine-similarity edit-distance

asked Jun 11 '15 at 00:45

jxn

7,685
28
90
172

Prev 1 2 3

…

64 65 Next