Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
-1
votes
1 answer

Convert Levenshtein Distance to Error Rate

Is their a way to convert levenstein distances to error rates? With the error rate being the fraction of the sequence that is not exactly the same.
-1
votes
5 answers

Levenshtein distance combination

LD = Levenshtein Distance Just doing a few examples on paper, this seems to work, but does anyone know if this is always true? Lets say I have 3 strings BOT BOB BOM LD(BOT,BOB) = 1 and LD(BOB,BOM)=1 then LD(BOT,BOM)=max(LD(BOT,BOB) ,…
Matt
  • 25,943
  • 66
  • 198
  • 303
-1
votes
1 answer

List Levenshtein scores in order PHP

I have a form that compares 1 word with many and outputs a list of levenshtein scores. How can I get those scores so they list in order, smallest levenshtein score 1st:
user1721230
  • 317
  • 1
  • 6
  • 19
-2
votes
1 answer

Data matching in python (fuzzy, levenstein?)

I would like to do something like an excel fuzzy v-lookup, but in python. I have a list of ~10,000 concatenated strings like: JohnSmith5159LosAngeles JaneDo7729NewYork etc that I would like to look up similar strings among a list formatted the…
-2
votes
1 answer

Find csv lines by word similarity

I have a csv file with thousands of lines. I would like to retrieve only the lines with some similarity regarding a specific word. In this case I am expecting to catch the line 1, 2 and 4. Any idea how to achieve that? import csv a='Microsoft' f =…
anvd
  • 3,997
  • 19
  • 65
  • 126
-2
votes
1 answer

How to find average of strings while using k means clustering?

I am trying to build a message spam filter using k-means and Levenshtein distance (consider this mandatory). I am having a problem understanding how can I compute the average of 'strings' when I have to figure out the centroid for each cluster?
-2
votes
1 answer

Calculate Editing Distance with Levenshtein

Problem: I've coded a Levenshtein String Editing program that appears to work correctly to me, but apparently yields the wrong answer. I think I misunderstand how the Editing Distance is calculated. Comparison for the strings superca and…
dapperdan1985
  • 3,132
  • 2
  • 15
  • 26
-2
votes
1 answer

Java Lambda create a filter with a predicate function which determines if the Levenshtein distance is greater than 2

I have a query to get the most similar value. Well I need to define the minimum Levenshtein distance result. If the score is more than 2, I don't want to see the value as part of the recommendation. String recommendation = …
userit1985
  • 961
  • 1
  • 13
  • 28
-2
votes
2 answers

Group a list of strings by Levenshtein in Python

I created a script to calculate the Levenshtein distance of two strings. Now I want to group a list of strings in base of the Levenshtein distance. (if strings have a distance under a threshold, they will be in the same groub): At the moment, I have…
Sfinos
  • 279
  • 4
  • 15
-2
votes
1 answer

Porting C# Levenshtein Distance to Java

I am developing a spell checker application from an existing one, can any one help me changing this part of code, i can't port pointers or stackalloc to Java because no equivalent exists. A java method with exactly same functionality . public…
Navid
  • 23
  • 1
  • 5
-3
votes
1 answer

What is the utility of the Levenshtein distance?

`So I had a debate with my teacher on what was the best method to quantify how different two given strings are. She told me to use the Levenshtein distance (https://en.wikipedia.org/wiki/Levenshtein_distance if you don't know what it is). The thing…
-3
votes
1 answer

String matching algorithms in python

I am looking for some suggestions on the algorithms which could be used for string matching which also supports non-english languages too. Previously tried algorithm: I have tried Levenshtein distance (Fuzzy matching) with token_sort_ratio…
-3
votes
1 answer

C# implementation of Levenshtein algorithm for substring matching

I'm playing with Levenshtein distance for getting a C# implementation which allows not only to tell whether two strings are similar, but also find a similar string (the needle) in a larger string (the haystack). To this end, I tried to follow the…
Naftis
  • 4,393
  • 7
  • 63
  • 91
-3
votes
1 answer

Missing values and NPE in Levenshtein Algorithm

I need to create this class, but I can't get it to work: public void printTable(): prints table (two-dimensional array). (optional: print the table with the two words in their correct positions in the first row and first column…
Kamikaze
  • 26
  • 1
-3
votes
2 answers

SQLite selection with typo

in my Android app I need to select data from a SQLite database. However, I have a search field in which users can type the name of a location. As they may typo this name, I need to be able to draw the relevant records from the database according to…
ashiswin
  • 637
  • 5
  • 11
1 2 3
64
65