Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
2
votes
4 answers

How to check how many characters a variable has in common with another variable

If I have two variables, and I want to see how many characters they have in common, what would I do to reach a number of how many were wrong? for example: a = "word" b = "wind" a - b = 2 is there a way to do this or to make what is above…
2
votes
0 answers

Leveraging existing set of known variants (acronyms, abbreviations) for string matching

I'm keeping this language agnostic, as I'm open to any platform that can provide a solution. My current implementation is in Excel/VBA, but I'm investigating Python, JavaScript and SSMS. Are there any existing methods for leveraging a collection of…
ghostrobot
  • 43
  • 4
2
votes
1 answer

Locality-sensitive hashing of strings?

Is there a hash function for strings, such that strings within a small edit distance (for example, misspellings) would map to the same, or very close, hash values, while dissimilar strings would tend not to?
MWB
  • 11,740
  • 6
  • 46
  • 91
2
votes
1 answer

performance issue, edit distance for large strings LCP vs Levenshtein vs SIFT

So I'm trying to calculate the distance between two large strings (about 20-100). The obstacle is the performance, I need to run 20k distance comparisons. (It takes hours) After investigating, I came a cross few algorithms, And I'm having trouble to…
Adi Darachi
  • 2,137
  • 1
  • 16
  • 29
2
votes
1 answer

How to group sentences by edit distance?

I have a large set (36k sentence) of sentences (text list) and their POS tags (POS list), and I'd like to group/cluster the elements in the POS list using edit distance/Levenshtein: (e.g Sentx POS tags= [CC DT VBZ RB JJ], Senty POS tags= [CC DT VBZ…
2
votes
1 answer

Calculate edit distance percentage

I am attempting to get a percentage of an edit distance from a group of sequences. So far this is what I have: library(stringdist) sequence <- c("CA--------W----------------------EKDRRTEAF---F------", …
El David
  • 375
  • 2
  • 3
  • 11
2
votes
2 answers

use edit distance on arrays in perl

I am attempting to compare the edit distance between two arrays. I have tried using Text:Levenshtein. #!/usr/bin/perl -w use strict; use Text::Levenshtein qw(distance); my @words = qw(four foo bar); my @list = qw(foo fear); my @distances =…
2
votes
1 answer

Is there any algorithm to compute edit distance between two graphs including same nodes?

First, I know there has been a lot of works to compute the edit distance between two graphs. But most of the GED algorithms are applied in general cases. Now considering my case, there are two graphs G(V1,E1) and G(V2,E2). Vk is a set of nodes which…
Yu Gu
  • 2,382
  • 5
  • 18
  • 33
2
votes
2 answers

What tool or algorithm should I use to generate words from a keyword which is at a given Damerau–Levenshtein distance?

Damerau-Levenshtein distance is like: "abcd", "aacd" => 1 DL distance "abcd", "aadc" => 2 DL distance More about editdistance: https://pypi.python.org/pypi/editdistance More about Damerau-Levenshtein…
Kroy
  • 299
  • 1
  • 5
  • 18
2
votes
1 answer

neo4j edit distance search

I am running neo4j 3.0.4 and want do a search on the node property using edit distance of 1. I searched the documentation and couldn't find anything, the closest I found was regex search. Any help would be appreciated.
Nikhil
  • 51
  • 5
2
votes
2 answers

Is there an edit distance metric which doesn't rely on order at all?

For example, let's say I have these two lists: var a = [1,2,3]; var b = [3,2,1]; The Levenshtein distance between them would be 2. I'm looking for a metric where the distance would be 0, i.e. lists with the same elements are regarded as the same…
user377628
2
votes
3 answers

Edit Distance (Dynamic Programming): Aren't insertion and deletion the same thing?

In looking through the dynamic programming algorithm for computing the minimum edit distance between two strings I am having a hard time grasping one thing. To me it seems like given the two strings s and t inserting a character into s would be the…
Mike Sweeney
  • 1,896
  • 2
  • 18
  • 20
2
votes
1 answer

Convert a string to another string in the shortest path

I have two strings, say str1 and str2. I need to convert the first one to the second one while making the least number of edits. This is what is called as Edit Distance. Suppose we need to convert Sunday to Saturday. The first letter is the same,…
SexyBeast
  • 7,913
  • 28
  • 108
  • 196
2
votes
0 answers

comput edit distance between 2 very large strings

SCENARIO: Given 2 input strings I need to find minimum number of insertions deletions and substitutions required to convert one string to other. The strings are text from 2 files. The comparison has to be done at word level. What i have done is…
uzair_syed
  • 313
  • 3
  • 16
2
votes
1 answer

Similarity matrix for weighted edit distance

I wanted to implement a modification of the basic edit distance algorithm. That is, the weighted edit distance. (Context: Spelling errors while trying to create a search engine) For example, the cost of substituting s by a would be lesser than…