Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
3 answers

Finding similar strings with restricted alpha characters using Python

I want to group similar strings, however, I would prefer to be smart to catch whether conventions like '/' or '-' are diverged instead of letter differences. Given following input: moose mouse mo/os/e m.ouse alpha = ['/','.'] I want to group…
hurturk
  • 5,214
  • 24
  • 41
0
votes
0 answers

Weighted costs Levenshtein Algorithm for R

I am trying to compare two lists of company names using the fuzzy string matching Levenshtein algorithm. I already implemented my own version in Python, but despite using numpy and other tricks to try and speed it up, Python is slow and on my data…
ctrl-z pls
  • 331
  • 6
  • 21
0
votes
2 answers

Return result of an mysql function

I have installed this Levenshtein function in MYSQL. The maker advises to use this: select levenshtein('butt', 'but') from test; select levenshtein_ratio('butt', 'but') from test; I want to calculate the Levenshtein Ratio between $search and each…
Tonald Drump
  • 1,227
  • 1
  • 19
  • 32
0
votes
1 answer

Minimum Levenshtein distance across multiple words

I am trying to do some string matching using the Levenshtein algorithm for closest words on businesses. (In python but language won't make a huge difference) An example query would be search = 'bna' lat & lon are close by the result I am looking…
0
votes
0 answers

MySQL Levenshtein distance as a query instead of UDF

I have a case where I need to calculate the Levenshtein distance between two columns for a row in MySQL. There are UDF's available for this, but I need to do this without a UDF. The reason for this is that I am using MemSQL, which is an extremely…
user396404
  • 2,759
  • 7
  • 31
  • 42
0
votes
0 answers

Efficient kNN graph construction with deferred selection of k

Using Levenshtein distance as a metric, I want to find the exact k-nearest neighbours for all elements in a large set of strings, but I am not yet sure how high a value of k I will require. Is there an algorithm or data structure that will allow me…
Aramdooli
  • 43
  • 1
  • 5
0
votes
1 answer

Matching string using levenshtein distance and euristics

I have string patterns ('rules'), in 'categories'. e.g.: Category1 lorem ipsum dolor sit amet consectetur adipiscing elit fusce sit amet ante nisi lorem ut sem interdum molestie suspendisse non lorem ut sem interdum molestie Category2 vivamus…
0
votes
1 answer

In R, trying to calculate Levenshtein distance of strings in a column then cluster and label by another column

Here is a truncated version of my data set. There are many more rows in the full set. I know I can convert the second column to a vector via as.vector(df[,2]), which I can then use for distance calculation. Once I have the distances, I'm going to…
0
votes
2 answers

Prolog raises out of local stack for no good reason

I'm trying to implement Levenshtein distance in Prolog. The implementation is pretty straightforward: levenshtein(W1, W2, D) :- atom_length(W1, L1), atom_length(W2, L2), lev(W1, W2, L1, L2, D), !. lev(_, _, L1, 0, D) :- D is L1,…
s1ddok
  • 4,615
  • 1
  • 18
  • 31
0
votes
2 answers

Levenshtein distance in python giving only 1 as edit distance

I have a python program to read two lists (one with errors and other with correct data). Every element in my list with error needs to be compared with every element in my correct list. After comparing i get all the edit distance between every…
0
votes
1 answer

Why the definition of edit Distance algorithm in Stanford NLP course plus 2 not 1

I am studying the Stanford NLP course by the following slides:https://web.stanford.edu/class/cs124/lec/med.pdf. The definition of edit Distance algorithm in this slide as following: Initialization D(i,0) = i D(0,j) = j Recurrence Relation: For…
tktktk0711
  • 1,656
  • 7
  • 32
  • 59
0
votes
1 answer

Levenshtein edit distance Python

This piece of code returns the Levenshtein edit distance of 2 terms. How can i make this so that insertion and deletion only costs 0.5 instead of 1 ? substitution should still costs 1. def substCost(x,y): if x == y: return 0 else: …
Brutalized
  • 85
  • 2
  • 9
0
votes
1 answer

Create a new table column with closest string match from another table

I have two lists of names for locations, with slightly different spelling, capitalisation, etc. I'm trying to match each site in the first list to the most similar one in the second list. SELECT name1, name2 FROM table1, table2 WHERE…
Jamie Bull
  • 12,889
  • 15
  • 77
  • 116
0
votes
1 answer

Iterative version of Damerau–Levenshtein distance

Levenshtein distance can be computed iteratively using two rows this way: https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_two_matrix_rows I came across the Optimal String alignment distance that does take into account the…
jeffreyveon
  • 13,400
  • 18
  • 79
  • 129
0
votes
2 answers

Levenshtein compare strings without changing numbers

I'm looking for a method to find similar symbolnames, where those names are often a combination of text an numbers, like "value1", "_value2", "test_5" etc. Now to find similar names I tried using the Levenshtein distance, but for the algorithm the…
Robin
  • 123
  • 7