Questions tagged [levenshtein-distance]

A metric for measuring the amount of difference between two sequences. The Levenshtein distance allows deletion, insertion and substitution.

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance is a specific algorithm of edit distance algorithms.

References:
Wikipedia
RosettaCode
Edit Distance (Wikipedia)
Hirschberg's algorithm (Wikipedia)

967 questions
0
votes
0 answers

Implementing Levenstein distance finding function with Theano

I have to compare a lot of enormous strings with each other and using algorithm like this: def distance(a, b): "Calculates the Levenshtein distance between a and b." n, m = len(a), len(b) if n > m: # Make sure n <= m, to use…
0
votes
1 answer

Android, Java - Fix an OCR-ed word to a valid english dictionary word in real time

My application involves scanning through the phone camera and detecting text. The only words that my application is concerned with is valid english words. I have a list of ~354,000 valid english words that i can compare my scanned word with. Since…
Abdul Wasae
  • 3,614
  • 4
  • 34
  • 56
0
votes
1 answer

Similarity on many strings in database

What could be the best way to check if two objects with many properties are similar? Lets say I have an object - address, which has 10 fields, like: location1, location2, location3, location4, ..., postalCode, owner, habitants.. They are all stored…
sandris
  • 1,478
  • 2
  • 18
  • 34
0
votes
1 answer

Text grouping the text

I need help in grouping the texts ..I have a list of merchants like this and we can see that first few belong to CENTURYLINK next to SMART ATT ..is there a way to group/label these texts with a single label or categorize these texts as per the pool…
pskumar
  • 37
  • 1
  • 4
0
votes
4 answers

(PHP) Matching a user search to an array

First of all, I'm new to development so need a little hand holding. I've built a simple search field (with suggestions) and send users to a relevant landing page on an array match. I return an error message if the search isn't relevant. However, my…
Ben F
  • 75
  • 8
0
votes
1 answer

Swift Trie levenshtein distance search

I've built a trie data structure that looks like this: struct Trie : Equatable { private var children: [Element: Trie] private var endHere: Bool } to perform autocorrection operations on input from a…
barndog
  • 6,975
  • 8
  • 53
  • 105
0
votes
0 answers

Strange out of bounds on Android 6

I'm once again searching for a strange issue :) I've been running a algorithm to calculate a distance of Levenshtein which seemed to work fine until a client started to have issues on a very small amount of his customers. (We're talking about 1 out…
SeikoTheWiz
  • 853
  • 1
  • 10
  • 27
0
votes
1 answer

Using reduce, map or other function to avoid for loops in python

I have a program working for calculating the distance and then apply the k-means algorithm. I tested on a small list and it's working fine and fast, however, my original list is very big (>5000), so it's taking forever and I ended it up terminating…
0
votes
2 answers

plpgsql function calling trigram similarity function inside does not utilize GIN or GIST indexes

I wanted to combine PostgreSQL Levenshtein and trigram similarity functions. The main advantage of the trigram similarity function is that it can utilize GIN or GIST indexes and thus can return fuzzy match results quickly. However, if it is called…
zlatko
  • 596
  • 1
  • 6
  • 23
0
votes
1 answer

Levenshtein-distance is there another way rather than compare the missspelled word with all the dictionary word

i was searching for AI algorithm for spelling correction and i found Levenshtein distance algorithm that compare the similarity between two string so my question should i implement this similarity between the wrong word with the all the words that's…
0
votes
2 answers

Fuzzy Matching Addresses

I am busy writing a simple algorithm to fuzzy match addresses from two datasets. I am calculating the levenshtein distance between two addresses and then adding the exact match or the shortest match to a matched array. However this is very slow as…
liamjnorman
  • 784
  • 1
  • 16
  • 30
0
votes
4 answers

Why is this code producing an exponential loop? .Net, Lehvenstein Distance

So recently I embarked on a coding project to try create some code to mathematically create a way to depict how similar two strings are. On my research I found plenty of examples online to help me create the code I desired. I have an error with one…
0
votes
1 answer

Cannot get python packages to work

I am trying to calculate levenshtein distance between 2 strings. Tried to install 2 packages (python-levenshtein) and pylev Used ananconda (on Win 64 machine) for the install conda install -c https://conda.anaconda.org/trent pylevenshtein It looks…
sourav
  • 179
  • 1
  • 1
  • 14
0
votes
1 answer

How to do cross rows operation in SAS ?

I have an email list in SAS dataset. I want to identify similar email address from the list. I am trying to implement COMPGED function across all the rows for email variable. I need to sort the list based on similar distance so that similar email…
0
votes
1 answer

What is the best distance metric for places

I'm searching for a "good" / easy metric to recognize similar places / user input to avoid crreating duplicates. Levenstein distance works good for Typos like bakery bekerry (Levenstein Distance: 2) But "fails" for swapped words St Ursula…
Tobias
  • 7,282
  • 6
  • 63
  • 85