Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions

votes

3 answers

Normalizing the edit distance

I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two…

asked Aug 20 '17 at 14:48

Naufal Khalid

votes

11 answers

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database. For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS"…

string-matching levenshtein-distance fuzzy-search

asked Feb 27 '09 at 15:37

Ash

votes

4 answers

Java fuzzy String matching with names

I've got a stand-alone CSV data loading process that I coded in Java that has to use some fuzzy string matching. It's definitely not ideal, but I don't have much choice. I am matching using a first and last name and I cache all the possibilities at…

java string levenshtein-distance

asked Jan 11 '14 at 02:07

Durandal

5,575
5
35
49

votes

4 answers

Where can the documentation for python-Levenshtein be found online?

I've found a great python library implementing Levenshtein functions (distance, ratio, etc.) at http://code.google.com/p/pylevenshtein/ but the project seems inactive and the documentation is nowhere to be found. I was wondering if anyone knows…

python documentation levenshtein-distance

asked Aug 08 '13 at 19:27

Phil B

5,589
7
42
58

votes

4 answers

what is a good metric for deciding if 2 Strings are "similar enough"

I'm working on a very rough, first-draft algorithm to determine how similar 2 Strings are. I'm also using Levenshtein Distance to calculate the edit distance between the Strings. What I'm doing currently is basically taking the total number of edits…

java string-matching levenshtein-distance similarity

asked Dec 09 '11 at 20:53

Hristo

45,559
65
163
230

votes

2 answers

Edit distance such as Levenshtein taking into account proximity on keyboard

Is there an edit distance such as Levenshtein which takes into account distance for substitutions? For example, if we would consider if words are equal, typo and tylo are really close (p and l are physically close on the keyboard), while typo and…

python levenshtein-distance

asked Mar 24 '15 at 13:27

PascalVKooten

20,643
17
103
160

votes

2 answers

Python: String clustering with scikit-learn's dbscan, using Levenshtein distance as metric:

I have been trying to cluster multiple datasets of URLs (around 1 million each), to find the original and the typos of each URL. I decided to use the levenshtein distance as a similarity metric, along with dbscan as the clustering algorithm as…

python machine-learning scikit-learn cluster-analysis levenshtein-distance

asked Aug 02 '16 at 12:20

KaziJehangir

votes

4 answers

Edit distance between two graphs

I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs? I mean, a scalar measure that identifies the number of atomic operations (node and edges…

algorithm language-agnostic levenshtein-distance edit-distance

asked May 06 '13 at 13:15

linello

8,451
18
63
109

votes

1 answer

How do you implement Levenshtein distance in Delphi?

I'm posting this in the spirit of answering your own questions. The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here, in Delphi? Just a note on performance: This…

algorithm delphi levenshtein-distance edit-distance

asked Sep 10 '08 at 17:38

JosephStyons

57,317
63
160
234

votes

6 answers

Alternative to Levenshtein and Trigram

Say I have the following two strings in my database: (1) 'Levi Watkins Learning Center - Alabama State University' (2) 'ETH Library' My software receives free text inputs from a data source, and it should match those free texts to the pre-defined…

levenshtein-distance string-metric

asked Nov 23 '13 at 13:28

Jonas Sourlier

13,684
16
77
148

votes

7 answers

How to install python-levenshtein on Windows?

After searching for days I'm about ready to give up finding precompiled binaries for Python 2.7 (Windows 64-bit) of the Python Levenshtein library, so not I'm attempting to compile it myself. I've installed the most recent version of MinGW32…

python windows levenshtein-distance

asked Nov 02 '12 at 17:33

Hubro

56,214
69
228
381

votes

5 answers

How to sort an array by similarity in relation to an inputted word.

I have on PHP array, for example: $arr = array("hello", "try", "hel", "hey hello"); Now I want to do rearrange of the array which will be based on the most nearly close words between the array and my $search var. How can I do that?

php arrays search levenshtein-distance

asked Aug 27 '11 at 22:18

AimOn

votes

10 answers

Can't install Levenshtein distance package on Windows Python 3.5

I need to install python Levenshtein distance package in order to use this library. Unfortunately, I am not able to install it succesfully. I usually install libraries with pip. However, this time I am getting error: [WinError 2] The system cannot…

python-3.x pip levenshtein-distance

asked Jun 07 '16 at 10:19

hipoglucido

votes

9 answers

Best way to detect similar email addresses?

I have a list of ~20,000 email addresses, some of which I know to be fraudulent attempts to get around a "1 per e-mail" limit, such as username1@gmail.com, username1a@gmail.com, username1b@gmail.com, etc. I want to find similar email addresses for…

c# levenshtein-distance

asked May 11 '10 at 16:03

Chris

27,596
25
124
225

votes

1 answer

Is there a sparse edit distance algorithm?

Say you have two strings of length 100,000 containing zeros and ones. You can compute their edit distance in roughly 10^10 operations. If each string only has 100 ones and the rest are zeros then I can represent each string using 100 integers…

algorithm levenshtein-distance

asked Aug 03 '18 at 17:24

Simd

19,447
42
136
271

Prev 1 2

…

64 65 Next