Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions

votes

2 answers

How can I determine Levenshtein distance for Mandarin Chinese characters?

We are developing a system to do fuzzy matching on over 50 international languages using the UTF-8, UTF-16, and UTF-32 Unicode character standard. So far, we have been able to use Levenshtein distance to detect misspellings of German Unicode…

asked Sep 12 '12 at 02:56

Frank

1,406
2
16
42

votes

1 answer

Efficiently determine "how sorted" a list is, eg. Levenshtein distance

I'm doing some research on ranking algorithms, and would like to, given a sorted list and some permutation of that list, calculate some distance between the two permutations. For the case of the Levenshtein distance, this corresponds to calculating…

python sorting permutation levenshtein-distance ranking-functions

asked Nov 21 '11 at 02:24

stefan

1,511
9
13

votes

2 answers

Algorithm to find edit distance to all substrings

Given 2 strings s and t. I need to find for each substring in s edit distance(Levenshtein distance) to t. Actually I need to know for each i position in s what is the minimum edit distance for all substrings started at position i. For example: t =…

string algorithm levenshtein-distance similarity edit-distance

asked Nov 15 '11 at 16:49

Ivan Bianko

1,749
15
22

votes

3 answers

String Distance Matrix in Python

How to calculate Levenshtein Distance matrix of strings in Python ? str1 str2 str3 str4 ... strn str1 0.8 0.4 0.6 0.1 ... 0.2 str2 0.4 0.7 0.5 0.1 ... 0.1 …

python string machine-learning text-mining levenshtein-distance

asked May 25 '16 at 06:05

Ajay Jadhav

votes

3 answers

How do diff/patch work and how safe are they?

Regarding how they work, I was wondering low-level working stuff: What will trigger a merge conflict? Is the context also used by the tools in order to apply the patch? How do they deal with changes that do not actually modify source code behavior?…

git diff patch levenshtein-distance lcs

asked Nov 05 '15 at 13:20

cenouro

votes

2 answers

how to convert a string into a palindrome with minimum number of operations?

Here is the problem states to convert a string into a palindrome with minimum number of operations. I know it is similar to the Levenshtein distance but I can't solve it yet For example, for input mohammadsajjadhossain, the output is 8.

algorithm dynamic-programming levenshtein-distance

asked Jan 19 '11 at 16:21

user467871

votes

6 answers

Text similarity algorithm

I have two subtitles files. I need a function that tells whether they represent the same text, or the similar text Sometimes there are comments like "The wind is blowing... the music is playing" in one file only. But 80% percent of the contents will…

java text nlp levenshtein-distance similarity

asked Feb 24 '10 at 11:34

EugeneP

11,783
32
96
142

votes

1 answer

How to normalise Levenshtein distance for maximum alignment length rather than for string length?

Problem: A few R packages feature Levenshtein distance implementations for computing the similarity of two strings, e.g. http://finzi.psych.upenn.edu/R/library/RecordLinkage/html/strcmp.html. The distances computed can easily be normalised for…

similarity levenshtein-distance edit-distance

asked Apr 13 '12 at 12:34

jvh_ch

votes

3 answers

Levenshtein distance in regular expression

Is it possible to include Levenshtein distance in a regular expression query? (Except by making union between permutations, like this to search for "hello" with Levenshtein distance 1: .ello | h.llo | he.lo | hel.o | hell. since this is stupid and…

regex levenshtein-distance

asked Apr 10 '12 at 09:39

zdenda.online

2,451
3
23
45

votes

5 answers

Levenshtein distance symmetric?

I was informed Levenshtein distance is symmetric. When I used google's diffMatchPatch tool which computes Levenshtein distance among other things, the results don't imply Levenshtein distance is symmetric. i.e Levenshtein(x1,x2) is not equal to…

algorithm levenshtein-distance

asked Mar 15 '12 at 14:39

user1271793

votes

9 answers

Efficient string similarity grouping

Setting: I have data on people, and their parent's names, and I want to find siblings (people with identical parent names). pdata<-data.frame(parents_name=c("peter pan + marta steward", "pieter pan + marta…

r string performance levenshtein-distance

asked Jan 02 '18 at 08:59

sheß

votes

9 answers

How do I convert between a measure of similarity and a measure of difference (distance)?

Is there a general way to convert between a measure of similarity and a measure of distance? Consider a similarity measure like the number of 2-grams that two strings have in common. 2-grams('beta', 'delta') = 1 2-grams('apple', 'dappled') = 4 What…

metrics string-comparison levenshtein-distance

asked Oct 31 '10 at 19:06

135498

votes

2 answers

Calculating a relative Levenshtein distance - make sense?

I am using both Daitch-Mokotoff soundexing and Damerau-Levenshtein to find out if a user entry and a value in the application are "the same". Is Levenshtein distance supposed to be used as an absolute value? If I have a 20 letter word, a distance of…

compare fuzzy words linguistics levenshtein-distance

asked Oct 06 '10 at 19:46

Joseph Tura

6,290
8
47
73

votes

3 answers

Matching an approximate string in a Core Data store

I have a small problem with the core data application i'm currently writing. I have two differents models, contexts and peristent stores. One is for my app data, the other one is for a website with relevant infos to me. Most of the time, I match…

cocoa string core-data levenshtein-distance

asked May 19 '09 at 10:18

damdamdam

votes

6 answers

Is there an edit distance algorithm that takes "chunk transposition" into account?

I put "chunk transposition" in quotes because I don't know whether or what the technical term should be. Just knowing if there is a technical term for the process would be very helpful. The Wikipedia article on edit distance gives some good…

algorithm language-agnostic levenshtein-distance edit-distance

asked May 18 '09 at 14:44

Steven Huwig

20,015
9
55
79

Prev 1 2 3

…

64 65 Next