Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
1 answer

Disambiguation of Names using Edit Distance

I have a huge list of company names and a huge list of zipcodes associated with those names. (>100,000). I have to output similar names (for example, AJAX INC and AJAX are the same company, I have chosen a threshold of 4 characters for edit…
user1773010
  • 107
  • 7
0
votes
1 answer

What are the differences between insertion, deletion and substitution

I am about to tackle a programming problem about the Levenshtein Distance. And according to the definition given on my sheet, it states that the Lenveshtein distances is equal to the number of substitutions, insertions and deletions between two…
entropy
  • 169
  • 3
  • 13
0
votes
0 answers

Levenshtein distance combined with SQL LIKE

I'm trying out functionality of Levenshtein: MySQL + PHP However my query doesn't return results: SELECT * FROM products WHERE levenshtein('%".$search."%', `name`) < 5 OR levenshtein('%".$search."%', `series`) < 5 OR levenshtein('%".$search."%',…
0
votes
1 answer

How to count number of a changes on string with javascript

I am trying to count number of a changes on one field ( one string ) with javascript. For example on name= MARTIN : MARTI => 1 change MARTINE => 1 change MATRIN => 2 changes MARBOM => 3 changes
Pavel Kenarov
  • 944
  • 1
  • 9
  • 21
0
votes
1 answer

elocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC

My server is on ubuntu 12.04. I'm trying to install the Levenshtein-MySQL-UDF available here : https://github.com/jmcejuela/Levenshtein-MySQL-UDF I downloaded the .zip and locate the levenshtein.c file here on my server :…
0
votes
1 answer

PHP Levenshtein on Query Result

I want to perform a levenshtein on a mysql query result. The query looks like this: $query_GID = "select `ID`,`game` from `gkn_catalog`"; $result_GID = $dbc->query($query_GID); $row_GID = mysqli_fetch_array($result_GID,MYSQLI_ASSOC); And here I…
SubZero
  • 113
  • 7
0
votes
2 answers

O(n) or faster algorithm for sorting a list by levenshtein distance?

Is there an O(n) or faster algorithm for sorting a list by levenshtein distance? I've looked some solutions on SO, but all of them invoke traditional sorting. Now, suppose you just sum the bytes of your input: you'll get hash keys that are pretty…
MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
0
votes
1 answer

Using levenshtein search for multiple words

Is it possible for levenshtein search to check all words in a search query against an array? The code is as follows: $input = $query; // array of words to check against $words = $somearray; // no shortest distance found, yet …
Javier Brooklyn
  • 624
  • 3
  • 9
  • 25
0
votes
0 answers

Using pl/sql output in WHERE-statement

I have a little question. So I'm using the Levenshtein-score to search for a comparison of more than 85% between street names in two different tables. But when I use my Levenshtein-score calculation in my WHERE-statement, I get as output for example…
NYannickske
  • 131
  • 1
  • 1
  • 4
0
votes
1 answer

Fuzzy matching for every query term in Solr

With the Levenshtein implementation of Lucene 4 claiming to be 100 times faster than before (http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html) I would like to do fuzzy matching of all terms in a query. The idea is…
Georg M. Sorst
  • 264
  • 4
  • 13
0
votes
2 answers

Damerau-Levenshtein distance for words

I am looking for such an algorithm, but one that makes substitutions between words and not between letters. Is there such an algorithm? I am looking for an implementation with SQL Server, but the name of the algorithm will be good enough.
Megetron
0
votes
1 answer

levenshtein algorithm parallel

I've implemented the algorithm using parallel_for. But mostly I use synchronized sections, so I have no profit. Maybe there is a better option? tbb::parallel_for (tbb::blocked_range(1, m * n), apply_transform(d, j, this, m, n)); void…
Alex A. Renoire
  • 361
  • 1
  • 2
  • 19
0
votes
1 answer

Replace mis-spelt word within a string

I have a basic search script which I'm working on. I want users to be able to enter several keywords. If one of these keywords are mis-spelt, I want to change that word for the search results and/or display a "did you mean ..." message. I have tried…
0
votes
1 answer

comparing strings and comparing how close they match

I extract exceptions from a log, here is an example of one: Exception: System.InvalidOperationException: Collection was modified; enumeration operation may not execute. at System.Collections.Generic.List`1.Enumerator.MoveNextRare() at…
user1547410
  • 863
  • 7
  • 27
  • 58
0
votes
3 answers

good metrics for array of strings distance

I have to arrays, S & T, containig words (lowercased, trimmed, without diacritics). Number of words can be different. (most of the data is a kind of proper names, rather short (<5)) I need to find a good metrics (and its implementation, or maybe…
ts.
  • 10,510
  • 7
  • 47
  • 73