Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions

votes

0 answers

Fuzzysearch: Looping over strings and keep the most similar

I have a number of correctly spelled names in a list L_name = ['Julia', 'John', 'James', 'Jay', 'Jordan'] And I have a list of results from a form where people entered their name L_entries = ['Julie', 'John', 'Jo', 'Jamie', 'Jamy', 'James', 'Jay',…

asked Sep 17 '19 at 06:45

user9092346

votes

1 answer

String matching function between two columns using Levenshtein distance in PySpark

I am trying to compare pairs of names by converting the levenshtein distance between them to a matching coef such as : coef = 1 - Levenstein(str1, str2) / max(length(str1) , length(str2)) However, when I implement it in PySpark using withColumn(),…

python dataframe apache-spark pyspark levenshtein-distance

asked Sep 05 '19 at 12:36

Melvin C.

votes

1 answer

Negative array index in a pseudocode

Consider this sample of a pseudocode for Levenshtein Distance: function Leven_dyn(S1[1..m], S2[1..n], m, n); var D[0..m, 0..n]; begin for i := 0 to m do D[i, 0] := i; end for for j := 0 to n do D[0, j] := j; end for for i := 0 to m…

algorithm language-agnostic pseudocode levenshtein-distance

asked Sep 03 '19 at 12:23

weno

votes

1 answer

How to compute word per token word distance and return the count of 0 distance in a column

I got two descriptions, one in a dataframe and other that is a list of words and I need to compute the levensthein distance of each word in the description against each word in the list and return the count of the result of the levensthein distance…

python python-3.x string nlp levenshtein-distance

asked Aug 24 '19 at 04:33

ScottUrbina

votes

1 answer

Quick search for a start location of a given string

Here, I would like to match a given string match_text to a longer string text. I want to find match_text's start location in text, the closest one (you can assume that there is only one location). My current version of the code is to for loop…

python levenshtein-distance

asked Jul 18 '19 at 19:13

titipata

5,321
3
35
59

votes

1 answer

I've been trying to match the a name received from 2 sources with each other and check if they are almost a match or not

In the sample data, I've listed the names of employers of a particular person(a prospective customer) which we received from 2 different sources. I've been trying to find a way to better match the two names and get good results. (Currently, it's…

string-comparison similarity levenshtein-distance fuzzy-logic edit-distance

asked Jul 18 '19 at 10:29

Yashu046

votes

0 answers

Is there a way to measure the similarity of a group of strings? A kind of 'mass' Levenshtein distance?

Levenshtein distance is a gauge of the distance between two strings. Is there a similar metric for assessing how similar a group of strings are to each other, a kind of mean or group distance? EDIT: I realize now that my question above could be…

string algorithm levenshtein-distance

asked Jun 18 '19 at 12:51

Chris T

votes

1 answer

ImportError: cannot import name 'ratio' from 'Levenshtein' (unknown location)

When I try import fuzzymatcher I get the error: ImportError: cannot import name 'ratio' from 'Levenshtein' (unknown location)

levenshtein-distance

asked Jun 14 '19 at 04:04

Ee Ann Ng

votes

1 answer

Fuzzy Lookup - Levenshtein with Syllable and Substring Match

What i tried to do is, create a fuzzy lookup algortihm that helps us to match these unique records with our mapping table by and shows us the percentage. I am applying Levenshtein algorithm to find the matches which is well known algorithm for fuzzy…

excel vba levenshtein-distance fuzzy-search

asked May 25 '19 at 11:57

MehmetCanbulat

votes

2 answers

similarity score between phrases

Levenshtein distance is an approach for measuring the difference between words, but not so for phrases. Is there a good distance metric for measuring differences between phrases? For example, if phrase 1 is made of n words x1 x2 x_n, and phrase 2 is…

python similarity levenshtein-distance sentence-similarity

asked Apr 11 '19 at 18:22

user1424739

11,937
17
63
152

votes

1 answer

clustering set of string sentences into unknown number of groups

I have a set of sentences (each sentence = x number of rows where x belongs to range (1,6)). I want to group these sentences based on the similarities between them. I have tried fuzzy wuzzy.token_set_ration but the trouble I have is that I need to…

python cluster-analysis levenshtein-distance fuzzywuzzy

asked Apr 10 '19 at 14:22

naivepredictor

votes

1 answer

How to call Levenshtien Function using the values from two different tables in T-SQL

I am trying to find the Levenshtien distance between the columns of two different tables TableA and TableB. Basically I need to match ColumnA of TableA with all the elements of ColumnB in TableB and find the Levenshtien Distance I have created a…

sql-server tsql optimization levenshtein-distance

asked Mar 02 '19 at 14:53

user5593950

votes

1 answer

Finding words in long string within edit distance ignoring whitespace

I am looking for a algorithm to efficiently search for words within given edit distance in a query string while ignoring whitespace. For e.g. If words on which I need to build an index are: OHIO, WELL and query String: HELLO HI THERE H E L L O…

java text-processing levenshtein-distance fuzzy-search edit-distance

asked Feb 21 '19 at 04:04

Gurwinder Singh

38,557
6
51
76

votes

1 answer

identifier splitting to approximately match documentation

Different software projects have different coding convention; even in the same project there may be different languages used and will have different convention. What is good for searching documentation (which appear outside the source files), with…

algorithm full-text-search query-string levenshtein-distance

asked Mar 21 '11 at 12:34

Tathagata

1,955
4
23
39

Prev 1 2 3

…

64 65 Next