Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
2 answers

How to effectively use the Levenshtein algorithm for text auto-completion

I'm using the Levenshtein distance algorithm to filter through some text in order to determine the best matching result for the purpose of text field auto-completion (and top 5 best results). Currently, I have an array of strings, and apply the…
Ryan Dias
  • 270
  • 2
  • 11
0
votes
1 answer

Getting each step from Levenshtein distance

I have a Java program that calculates the Levenshtein distance between two strings. I use this method to do it: public static int levDistance(String s, int len_s, String t, int len_t) { if (len_s == 0) return len_t; …
Loovjo
  • 534
  • 8
  • 23
0
votes
1 answer

Sort Array by combining orders from multiple Arrays

I'm making a simple search engine, and I have already indexed a lot of websites in a MySQL database. Now I would like to get a relevant list of results by keywords. Websites are indexed in my database with the following columns : hostname (without…
Maxime R.
  • 133
  • 1
  • 9
0
votes
1 answer

Levenshtein distance in Python - wrong result with national characters

I found similar topic: Levenshtein distance on diacritic characters, but it's PHP and I write in Python. Still, problem remains the same. For instance: levenshtein(kot, kod) = 1 levenshtein(się, sie) = 2, which is wrong. Any ideas on how to solve…
0
votes
1 answer

Revise existing Levenshtein distance code to accommodate different operation costs

I have found a lot of sources that determine Levenshtein distances (LD) between two strings. However all of them assume the costs for substitution, insertion, and deletion operations are all set to 1. This source for C++ is very efficient, which I…
user2191247
0
votes
0 answers

R - Updating a Dataframe Column

I have a data-frame with 2 columns that contains two different types of text The first column contains codes that are strings in the form of DD-HI-HO (DD being the code) Column 2 is free text which anyone can insert I am trying to populate the third…
John Smith
  • 2,448
  • 7
  • 54
  • 78
0
votes
2 answers

How to count the number of (number of duplicates) in a database

I am writing some PHP / MySQL to detect excessive site visits. While I find it simple enough to detect, for any given IP address, how many times that person has visited in say 24 hours and so whether they have exceeded some number that is the visit…
Frankie
  • 596
  • 3
  • 24
0
votes
1 answer

Efficiently Finding Closest Matching Existing String to Simple Wildcard String

I have a table of strings that are identifiers, but each identifier can be either fixed or a variable identifier with some static pieces. For example, an identifier could be ABC12345 or an identifier could be DEF**45 where * represents any…
user3170736
  • 511
  • 5
  • 24
0
votes
1 answer

Modifying Levenshtein distance to consider positions, while being symmetric

The Levenshtein distance to convert SATURDAY to SUNDAY is 3. One way is, delete A at pos 2, delete T at pos 3, substitute R at pos 5 by N. If I take position number as the weight for positions, the cost will be: 2 + 3 + 5 = 10 Similarly, if I…
Bruce
  • 945
  • 3
  • 12
  • 24
0
votes
0 answers

How can I edit this Levenshtein method to return the actual differences?

I found this ruby method on a wiki that finds the number of differences between two strings. def levenshtein(first, second) matrix = [(0..first.length).to_a] (1..second.length).each do |j| matrix << [j] + [0] * (first.length) end …
0
votes
0 answers

Missing/additional words when comparing texts

I want to compare two text files. I don't have a problem when there's only a spelling mistake (missing character, a wrong one or an additional one) but the problem is when there is a missing line/word or an additional one. In my research, i found…
0
votes
1 answer

Levenshtein-distance algorithm

def worddistance(source, target): ''' Return the Levenshtein distance between 2 strings ''' if len(source) > len(target): source, target = target, source #Now target becomes the larger string, if it is 0, surely len(source)…
Gavin
  • 2,784
  • 6
  • 41
  • 78
0
votes
1 answer

PHP - Is this Levenshtein distance recursive algorithm so slow or am I wrong?

I saw this Levenshtein formula on Wikipedia: I have implemented this algorithm in a recursive way (I know it is an inefficient way to implement it such a way, but I wanted to see how much inefficient it was), here is the code (in PHP): function…
tonix
  • 6,671
  • 13
  • 75
  • 136
0
votes
0 answers

How to limit amount of records in a double table with a function mysql query

What is the correct way to limit the amount of records from list1: (original source) select `List 1`.`name`, `List 2`.`name`, levenshtein_ratio(`List 1`.`name`, `List 2`.`name`) FROM `List 1`, `List 2` The following gives 0 as result: select…
boboloco
  • 1
  • 1
0
votes
1 answer

Levenshtein excel for huge set of data

I'm trying to use a Levenstein algo I found here to clean a huge amount of data. But having trouble implementing it. I have 100,000 rows of excel data. One of the columns contains a city name, these have multiple typos (hence levenstein) I have a…