Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
1 answer

Finding similar strings in large datasets

I'm using levenshtein distance to retrieve similar strings from a list. At the moment the list has just a few thousand items, but we'll need to support at least 100k items. I'm trying to make this more efficient and one technique I came up with was…
webber
  • 1,834
  • 5
  • 24
  • 56
0
votes
1 answer

Bad levenshtein results

I have a string comparison function based on levenshtein but it don't work properly. function levenshteinTest($input, $array) { $shortest = -1; foreach ($array as $word) { $lev = levenshtein($input, $word); if ($lev == 0) { …
Lithilion
  • 1,097
  • 2
  • 11
  • 26
0
votes
1 answer

Returning a list of fuzzy matches for a given string in Python?

I've seen a lot of ways of checking if two given strings are fuzzy matches, but I want to create a list of potential fuzzy matches for one given string so I can search through a huge list for them. The purpose of my code is to see if a given…
user3753722
  • 153
  • 2
  • 11
0
votes
1 answer

Redundancy in Levenshtein distance algorithm

In the typical dynamic Levenshtein distance algorithm, to compute the value of cell d[i][j], where i and j are the row and column numbers, respectively, we take the minimum of d[i-1][j-1]+0/1, d[i-1][j]+1 and d[i][j-1]+1. However, it seems to me…
Sean Kelleher
  • 1,952
  • 1
  • 23
  • 34
0
votes
1 answer

Missing Functions after install of Postgis 2.1

I have taken over a site that I did not build. The database is all messed up, and I am reloading the database from a schema dump file. The database also includes postgis 2.1. I am using: Postgresql 9.3 CentOS 6.5 Ruby 1.9.3 Ruby on Rails 3 The…
Big Al Ruby Newbie
  • 834
  • 1
  • 10
  • 30
0
votes
2 answers

Levenshtein Trie wrong distance

i searched the web for an implementation of a levenshtein trie and i found this: Levenshtein Distance Challenge: Causes. i tried to add a piece of code to normalize the words. If a word for example has 5 letters ('Apple') and i have this word…
Mulgard
  • 9,877
  • 34
  • 129
  • 232
0
votes
1 answer

levinshtein distance in mysqli prepared statement

I have the following mysqli prepare statement: $incity = "%{$mysqli->real_escape_string($city)}%"; $instreet = $mysqli->real_escape_string($street); $incp = "%{$mysqli->real_escape_string($cp)}%"; $mysqli->prepare("SELECT zonasrepartoid, calle,…
myhappycoding
  • 648
  • 7
  • 20
0
votes
1 answer

Calculate Levenshtein distance using pandas DataFrames

I'm trying to calculate Levenshtein distance for the following pandas DataFrame. I'm using this package for it. In [22]: df = pd.DataFrame({'id' : [1,2,3,4,5,6,7], 'path' :…
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
0
votes
2 answers

Constructing a Graph of Strings (Levenshtein Distance)

Currently, in my computer-science course, we are discussing graphs and how to find shortest distance using graphs. I received an assignment about a week ago where the teacher gave us the code for a graph using integers, and we have to adapt it to be…
Tristan
  • 55
  • 1
  • 8
0
votes
1 answer

using levenshtein with whitespaces for full text search

Right now I have a function that searches all posts of a certain user for key words (specified by the user), and return any posts that have matches for all of the key words. public function fullTextSearch($text, $userId, $offset = 0, $limit = 0) { …
user3245442
  • 43
  • 1
  • 8
0
votes
1 answer

how to use the Levenshtein distance string metric

Alright I kind of understand how this is used in theory but how would I actually put it into a program because all the example I look at are not in a code I am not asking for a code writing out just a little hint to put me in the right direction so…
0
votes
4 answers

Is there an efficient implementation for quantifying the similarity between two Strings?

Let's say I have several very long Strings consisting of completely random characters. I aim to represent their similarity to one designated master String in a number. For example: 12345 is very similar 23456, but not so similar to 12abcdef Assuming…
Andreas Hartmann
  • 1,825
  • 4
  • 23
  • 36
0
votes
0 answers

how to get a matching status of two string using Levenshtein distance

is it possible to get a matching status from two compared strings, for example: string 1 : this is a dog. string 2 : this is not a cat. How do I generate the below string using Levenshtein distance, could anybody help? string 1 : this is a …
0
votes
0 answers

Postgres comparing content with ground truth table

Given the following tables (each containing appr. 2 mio datasets): movie(title, genre, price) ground_truth_movie(title, genre) movie: | title | genre | price | |***************************|***********|*******| | Bria…
Dario Behringer
  • 287
  • 2
  • 3
  • 10
0
votes
0 answers

Find best matching string using Levenshtein distance algorithm

I am trying to implement the Levenshtein distance algorithm to find the most similar match The list is like List values = new ArrayList(); values.add("*"); values.add("3.63.161.2"); values.add("3*"); …
NullPointerException
  • 3,732
  • 5
  • 28
  • 62