Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
1 answer

Find minimum Levenshtein Distance between one word and an array of thousands

So my users wrote their addresses in a registration form, but a lot of them have typos. I have another list retrieved from the city records with the correct spelling of those addresses. So let's say I have "Brooklny" typed by them and I have the…
0
votes
1 answer

Why is the Levenshtein distance score so low for these two strings?

I am using a Levenshtein distance algorithm to find similar strings and I currently have my score for acceptance as 12 (because some of my strings have up to 5 words). But I was suprised to see the below two strings get a score of 11, they seem…
AbuMariam
  • 3,282
  • 13
  • 49
  • 82
0
votes
0 answers

Levenshtein distance with multiple comparisons

Currently I am trying to create a "best match" query. I came across this answer, but the main difference is that I have a table with more columns, and I need to compare 6 strings. Is there a way to implement the Levenshtein distance algorithm with a…
terbubbs
  • 1,512
  • 2
  • 25
  • 48
0
votes
1 answer

php levenstein similarity on combobox with id

I tried to make a coincidence in a combobox with a list of categories with the first similar word that find on a register, for example: Input a quote: "the sun is great and nobody can see it directly" explode each word in an array "the", "sun",…
0
votes
2 answers

Speeding up Levenshtein distance calculation in Ionic app

What I'm doing: I'm developing a mobile dictionary app for a number of languages How I'm doing it: Using ionic framework with combination of some angular and some pure js (imported from a working online dictionary site of the same languages) The…
0
votes
1 answer

closest string match for comparing OCR results

I'm OCRing few sample images. I have manually read and stored text contained in these images in a separate text file. I'm looking to test my OCR success rate. So, I'm looking for an algorithm that would tell me the a success percentage when…
0
votes
1 answer

How to effeciently find all fuzzy matches between a set of terms and a list of sentences?

I have a list of sentences (e.g. "This is an example sentence") and a glossary of terms (e.g. "sentence", "example sentence") and need to find all the terms that match the sentence with a cutoff on some Levenshtein ratio. How can I do it fast…
x3al
  • 586
  • 1
  • 8
  • 24
0
votes
2 answers

Adding exceptions to Levenshtein-Distance-like algorithm

I'm trying to compute how similar a sequence of up to 6 variables are. Currently I'm using a Collections Counter to return the frequency of different variables as my edit-distance. By default, the distance in editing a variable (add/sub/change) is…
0
votes
2 answers

LevenshteinDistance Method does not provide the most accurate result

I have a file with an "X" number of names, i need to match each of those names against another file and see if said name is amongst them, but written in a different way ("Verizon" -> "Verizon LTD"). I was doing this with a the "Fuzzy Lookup" interop…
Patrick
  • 1
  • 2
0
votes
0 answers

MySQL Similar values in VARCHAR column

I have a database table for storing restaurant names and the city they are located in. Example: name | city eleven madison park | NYC gramercy tavern | NYC Lotus of Siam | TOK The Modern | LA ABC Kitchen …
Ananth
  • 4,227
  • 2
  • 20
  • 26
0
votes
1 answer

perl custom sort by string similarity clustering

In Perl, I would like to sort a collection of different length strings in a way that automatically lumps together similar strings. Intuitively, I imagine I need some distance measure for each pair and then a clustering routine that groups by the…
719016
  • 9,922
  • 20
  • 85
  • 158
0
votes
1 answer

R - stringdist cost setting error

I have an error when I try to set the operations costs in stringdist Any ideas why ? library(stringdist) seq = rbind( c('aaa'), c('aba'), c('aab'), c('ccc') ) This works perfectly (Levensthein distance) stringdistmatrix(a = seq, b…
giac
  • 4,261
  • 5
  • 30
  • 59
0
votes
2 answers

How to get most important occurrences from an array?

First of all, this is not a language specific question, the below example uses PHP but it's more about the method (regex?) to find the answer. Let's say I have an array: $array = ['The Bert and Ernie game', 'The Bert & Ernie game', 'Bert and Ernie…
Bob van Luijt
  • 7,153
  • 12
  • 58
  • 101
0
votes
1 answer

What indexer do I use to find the list in the collection that is most similar to my list?

Lets say I have my list of ingredients: {'potato','rice','carrot','corn'} and I want to return lists from a database that are most similar to mine:…
JaseC
  • 3,103
  • 2
  • 21
  • 22
0
votes
0 answers

How do we ignore the order of letters in calculating Levenshtein distance?

This question is not new and i have seen some form of explanation here and here. Both methods described performing N grams (bigrams mostly) calculations on the terms of query 1 and query 2 and then finding the cosine similarity. I was hoping for a…
jxn
  • 7,685
  • 28
  • 90
  • 172