Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions

votes

2 answers

Levenshtein distance. Max distance exception

I have this levenstein algorithm: public static int? GetLevenshteinDistance(string input, string output, int maxDistance) { var stringOne = String.Empty; var stringTwo = String.Empty; if (input.Length >=…

c# string algorithm levenshtein-distance

asked Feb 28 '18 at 16:16

user3499880

votes

0 answers

Using Hamming distance to group a list of DNA sequences into similar sequences (Python)?

I've extracted a list of DNA sequences from a FASTQ file and its currently being stored in a list for now. sequences = ['ATCT','ATTT','ACGG','ACCG','ACGT','AGGT','ATGC','ATCC','AGTT'] I want to cluster sequences into a list of tuples so that each…

python string string-matching levenshtein-distance hamming-distance

asked Feb 22 '18 at 18:14

superasiantomtom95

votes

2 answers

Match items from two sets of data by highest % of similarities

php matching levenshtein-distance

asked Feb 16 '18 at 11:29

Igor Gorbenko

votes

1 answer

Levenshtein distance

I would like to know if there is any algorithm that returns the amount of insertions, deletions and substitutions between two words. Most algorithms only return and integer with the distance between the two words but I would like to also have how…

nlp levenshtein-distance

asked Feb 13 '18 at 14:34

m4sh4

votes

1 answer

Levenshtein distance Python UDF as fuzzy matching proxy in SQL join

I came across a forum post that describes a method of creating a Python UDF in Redshift: https://community.periscopedata.com/r/y715m2. More info about Python UDFs in Redshift:…

python sql statistics amazon-redshift levenshtein-distance

asked Feb 09 '18 at 15:32

user8834780

1,620
3
21
48

votes

1 answer

Which algorithm to match most similar string from a set?

Let's say I have a database of books that includes their titles. For a given listing from eBay or Craigslist or some other such site, I want to compare its title string to all of the book titles in my database to try to find a match. It's unlikely…

string algorithm language-agnostic levenshtein-distance n-gram

asked Jan 12 '18 at 23:54

user457586

votes

1 answer

vectorized text mining over multiple columns

I have some code that I would like to vectorize but I am not sure how. The following code gives some example data, comprised of names and addreses. name <- c("holiday inn", "geico", "zgf", "morton phillips") address <- c("400 lafayette pl tupelo…

r for-loop vectorization tm levenshtein-distance

asked Jan 08 '18 at 16:31

jvalenti

votes

1 answer

How to compute multiple related Levenshtein distances?

Given two strings of equal length, Levenshtein distance allows to find the minimum number of transformations necessary to get the second string, given the first. However, I'd like to find a way to adjust the alogrithm for multiple pairs of strings,…

levenshtein-distance

asked Jan 26 '11 at 20:13

user490735

votes

1 answer

Python - Assign the closest string from List A to List B based on Levenshtein distance - (ideally with pandas)

As introduction, I am pretty new to python, I just know how to use pandas mainly for data analysis. I currently have 2 lists of 100+ entries, "Keywords" and "Groups". I would like to generate an output (ideally a dataframe in pandas), where for…

python string python-3.x pandas levenshtein-distance

asked Dec 20 '17 at 14:54

Roberto Bertinetti

votes

0 answers

Python with mongodb

I am very beginner to python. I have two tables named as Table A and Table B, In Table A have 1M record is available and Table B have 14M records is available and each record is a very big sentence(Paragraph) with special character numbers etc.., I…

python mongodb python-3.x levenshtein-distance

asked Dec 16 '17 at 11:57

ramki

votes

1 answer

Fuzzy string search in array with postgresql

This is how I do fuzzy string search in postgresql: select * from table where levenshtein(name, 'value') < 2; But what can I do if the 'name' colum contains array? P.S.: It is necessary to use index. And this is the difference.

postgresql levenshtein-distance fuzzy-search

asked Dec 09 '17 at 19:01

kz_sergey

votes

1 answer

Levenshtein implementation in Clojure with memoization

This is a minimal Levenshtein (edit distance) implementation using Clojure with recursion: (defn levenshtein [s1, i1, s2, i2] (cond (= 0 i1) i2 (= 0 i2) i1 :else (min (+ (levenshtein s1 (- i1 1) s2 i2) 1) (+ (levenshtein…

clojure memoization levenshtein-distance

asked Dec 07 '17 at 15:38

gil.fernandes

12,978
5
63
76

votes

2 answers

pandas algorithm slow: for loops and lambda

summary: I am searching for misspellings between a bunch of data and it is taking forever I am iterating through a few CSV files (million lines total?), in each I am iterating through a json sub-value that has maybe 200 strings to search for. For…

python pandas levenshtein-distance

asked Sep 27 '17 at 13:42

jleatham

votes

0 answers

How can I visualize all changes in one string compared to another?

Currently, I use https://jsfiddle.net/MartinThoma/h9kL6zox/1/ (see this answer) to highlight changes from one string (< 255 chars) to another string (<255 chars). I can only add highlighting code to one of them. There are three types of changes…

javascript string algorithm levenshtein-distance

asked Jul 27 '17 at 07:39

Martin Thoma

124,992
159
614
958

votes

1 answer

Python/Pandas - String Comparisons

I have a list of strings/narratives which I need to compare and get a distance measure between each string. The current code I have written works but for larger lists it takes along time since I use 2 for loops. I have used the levenshtien distance…

python pandas for-loop levenshtein-distance

asked Jun 15 '17 at 12:59

Bryce Ramgovind

3,127
10
41
72

Prev 1 2 3

…

64 65 Next