Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
1
vote
2 answers

Kendall tau distance in Scala

Is this a correct implementation of Kendall tau distance in Scala def distance[A : Ordering](s: Seq[A], t: Seq[A]): Int = { assert(s.size == t.size, "Both sequences should be of the same length") s.combinations(2).zip(t.combinations(2)).count {…
Vilius Normantas
  • 3,708
  • 6
  • 25
  • 38
1
vote
1 answer

efficiently compute the edit distance between 1 string and a large set of other strings?

The use case is auto-complete options where I want to rank a large set of other strings by how like a fixed string they are. Is there any bastardization of something like a DFA RegEx that can do a better job than the start over on each option…
BCS
  • 75,627
  • 68
  • 187
  • 294
1
vote
1 answer

How to reconstruct strings in "edit_distance_problem"?

Suppose you have given dp table for string X = "AGGGCT" and string Y = "AGGCA" m = length of X + 1 n = length of Y + 1 0 1 2 3 4 5 1 0 1 2 3 4 2 1 0 1 2 3 dp[m][n] = 3 2 1 0 1 2 4 3 2 1 1 2 …
Vikrant Singh
  • 669
  • 1
  • 6
  • 18
1
vote
1 answer

Implementing edit distance method using recursion results in object heap error

private static int editDistance(ArrayList s1, ArrayList s2) { if (s1.size()==0) { return s2.size(); } else if (s2.size()==0) { return s1.size(); } else { …
Terry Li
  • 16,870
  • 30
  • 89
  • 134
1
vote
1 answer

Levenshtein distance in C with required memory of O(m)

I'm writing a code that calculates the edit distance of two given strings t and s with m = strlen(t) and n = strlen(s) and the code should only use memory in O(m). Furthermore, it should not need longer than 4 seconds for the calculation of two…
1
vote
0 answers

Levenshtein-Damerau Distance-Calculation with a Max-Distance-of-Interest Bound

Consider the C# implementation of the LD-distance calculation-algorithm suggested on this Wiki page. I'd like to extend it with a capability to abort the calculation-process in case a certain (pre-defined) distance-threshold has already been…
Bliss
  • 426
  • 2
  • 5
  • 19
1
vote
2 answers

Edit distance with varying dictionaries

My question is similar to Algorithm to transform one word to another through valid words But with is a major difference. I have one fixed word say "JAMES" and varying dictionaries as i/p. Ofcourse, I can't preprocess dictionary now. So I have to…
Anirvana
  • 29
  • 5
1
vote
2 answers

Levenshtein edit distance algorithm that supports Transposition of two adjacent letters in C#

i'm searching for an algorithm for computing Levenshtein edit distance that also supports the case in which two adjacent letters are transposed that is implemented in C#. for example the word "animals" and "ainmals" : switching between the…
Hady Elsahar
  • 2,121
  • 4
  • 29
  • 47
0
votes
1 answer

getting segmentation fault on multi dimension array while calculating Levenshtein Distance

I was trying to calculate Levenshtein Distance. The following code works for small strings e.g. kit/fit or sitting/knit. But, it gave me a segmentation fault for sunday/saturday strings. After using the GDB(for first time), I figured the problem is…
0
votes
1 answer

Finding the subset of a dictionary that has the minimum edit distance to a given string

I'm looking for the most efficient way of solving an Levenshtein edit distance problem. We are given as input: A set of strings S of size n <= 8, with average length m <= 50 A target string t of length l <= 50 Our task is to 'align' t with S…
0
votes
0 answers

Find out edit distance between two strings

I am calculating edit distance between two dataframe. Both the dataframe consists of ~30L of rows, as the dataframe size is large it is taking lot of time. Is there any way to improve the performance? for i in range(0,len(targets1)): if i % 100…
A14
  • 111
  • 11
0
votes
0 answers

Edit Distance graph in NetworkX

I have 2 graphs created with networkx G_1 has 23 edges and 15 nodes, G_2 has 22 edges and 13 nodes. When I run the function nx.graph_edit_distance(G_1, G_2) it takes 20min to run. However when I run it on my graphs G4_ and G_5 that have 5 nodes and…
MzBen
  • 1
  • 3
0
votes
1 answer

group_by edit distance between rows over multiple columns

I have the following data frame. Input: class id q1 q2 q3 q4 Ali 12 1 2 3 3 Tom 16 1 2 4 2 Tom 18 1 2 3 4 Ali 24 2 2 4 3 Ali 35 2 2 4 3 Tom 36 1 2 4 2 class indicates the…
Sandy
  • 1,100
  • 10
  • 18
0
votes
0 answers

Reducing time for minimum edit distance in python?

I am trying to create a list with the edit distances between each word in a set of documents, ranging from 10k-42k words. If my idea of edit distance is correct, I would end up with a distance for each word compared to every single other word. So if…
0
votes
0 answers

Numpy implementation of Edit Distance Algorithm

I am new to numpy(and python) and working on making the edit distance Algorithm with numpy. This is my code so far. I have an error for the first line after the else: . The error says: "index 3 is out of bounds for axis 0 with size 2". I'm very…
Moronis2234
  • 57
  • 1
  • 6