Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
3
votes
4 answers

Tools to compute graph edit distance (GED)

I read a lot of theory on computing graph edit distance (GED), or other graph similarity measures (such as http://goo.gl/gmDMgA) but I'm failing to find tools to accomplish such computations. Is there a programming library or softwares that computes…
Lucien S.
  • 5,123
  • 10
  • 52
  • 88
3
votes
1 answer

Custom replacement matrix for edit distance in R

I need to compute the edit distance between two strings based on a custom cost function for replacements. For example, I want to specify different cost for replacing 'a' with 'b' than replacing 'a' with 'c' Is there an R package that allows me to…
bfaskiplar
  • 865
  • 1
  • 7
  • 23
3
votes
4 answers

Weighted unordered string edit distance

I need an efficient way of calculating the minimum edit distance between two unordered collections of symbols. Like in the Levenshtein distance, which only works for sequences, I require insertions, deletions, and substitutions with different…
3
votes
1 answer

Tree Edit Distance: How can I get the optimal mapping?

I have implemented the algorithm by Zhang and Shasha to calculate the minimal edit distance between two trees. Everything is working fine and I am very pleased with the current running times. Now I would also like to generate a diff that highlights…
tux21b
  • 90,183
  • 16
  • 117
  • 101
3
votes
1 answer

Given the pairwise edit distance of a and b and b and c, can we find the pairwise edit distance of a and c?

If we have three string a, b, c and we know ( or already calculated ) edit_distance(a,b) and edit_distance(b,c), can we efficiently calculate edit_distance(a,c) without actually comparing a and c. *edit_distance(a,b) = number of character insertion,…
Cerberuz
  • 155
  • 2
  • 15
2
votes
0 answers

Graph edit distance for connected components in a graph - considering the spatial distance

has anyone ever done or seen something like this? I have two disconnected graphs of the same size with the same nodes but different edges. They may contain connected components. I want to compare one connected component "a" of graph 1 to one…
2
votes
0 answers

Equivalence of Edit Distance and Alignment Distance

(from: https://math.mit.edu/classes/18.417/Slides/alignment.pdf) The slide on the 11th page talks about how the Edit Distance and the Alignment Distance are equivalent. I understand how to prove that the Edit Distance will always be less than or…
2
votes
1 answer

How to extract a custom list of entities from a text file?

I have a list of entities which look something like this: ["Bluechoice HMO/POS", "Pathway X HMO/PPO", "HMO", "Indemnity/Traditional Health Plan/Standard"] It's not the exhaustive list, there are other similar entries. I want to extract these…
2
votes
1 answer

Efficient edit distance

I have a big corpus and I'm trying to find the most similar n-grams in the corpus. For that case, I'm using get_close matches. The problem is that this procedure takes a lot of time. A friend suggests me to convert the n-grams to MD5 and then…
Yanirmr
  • 923
  • 8
  • 25
2
votes
1 answer

Calculating errors between two strings in Java

I would like to calculate the percentage of error between two strings, that means if we assume that one string is the ground truth and the other string is a typed string, then I would like to calculate the number of mistakes in the typed…
machinery
  • 5,972
  • 12
  • 67
  • 118
2
votes
1 answer

Does the Levenshtein (Edit Distance) algorithm perform faster than O(n*m) in a native graph database?

Would the Levenshtein (Edit Distance) have better time complexity in a native graph database such as Neo4j than the current limit of O(n*m)? If so, why?
2
votes
0 answers

Formulate edit distance as matrix multiplication

I am computing a weighted edit distance between two strings using a slight modification of the Levenshtein distance where I use context-specific edit operation probabilities. Unlike the standard Levenshtein that only considers the best sequence of…
Jindřich
  • 10,270
  • 2
  • 23
  • 44
2
votes
0 answers

Given a list of strings find each of its string's closest match (edit distance) in another big list of strings

I have a list of strings small_list = ['string1', 'this is string 2', ...] and a larger list of strings big_list = ['is string 2', 'some other string 3', 'string 1', ...]. I want to find the string that is closest by edit distance for all of the…
user281989
  • 47
  • 7
2
votes
2 answers

In R distance between two sentences: Word-level comparison by minimum edit distance

While trying to learn R, I want to implement the algorithm below in R. Consider the two lists below: List 1: "crashed", "red", "car" List 2: "crashed", "blue", "bus" I want to find out how many actions it would take to transform 'list1' into…
Zero
  • 71
  • 8
2
votes
2 answers

Shortest path from one word to another via valid words (no graph)

I came across this variation of edit-distance problem: Find the shortest path from one word to another, for example storm->power, validating each intermediate word by using a isValidWord() function. There is no other access to the dictionary of…
ToyYoda
  • 23
  • 1
  • 4