Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
2
votes
2 answers

Javascript find edit distance not returning correct value

I'm working on a function that computes the edit distance of two strings but according to this only calculator Im getting an incorrect value. Im getting 19 and the calculator is returning 7. Im not sure whats wrong with my program I based it off of…
Daniel Kobe
  • 9,376
  • 15
  • 62
  • 109
2
votes
3 answers

how to convert python/cython unicode string to array of long integers, to do levenshtein edit distance

Possible Duplicate: How to correct bugs in this Damerau-Levenshtein implementation? I have the following Cython code (adapted from the bpbio project) that does Damerau-Levenenshtein edit-distance…
flow
  • 3,624
  • 36
  • 48
2
votes
0 answers

How to speed up EDIT_DISTANCE and Insert Query?

--------------- MASTER TABLE --------------- DATA_KEY NUMBER TEXT VARCHAR2(2000) ORDER_NO NUMBER --------------- DETAIL TABLE --------------- DATA_KEY NUMBER SIMILAR_DATA_KEY NUMBER DISTANCE_COUNT NUMBER --------------- INSERT…
2
votes
1 answer

Levenshtein distance where I only care about words

I want to check the distance between two strings in terms of inserting/deleting/editing words. This is similar to the levenshtein distance, but I only care about words, rather than characters. For example: "The cat sat on the mat" & "Dog sat…
Tarrare
  • 31
  • 3
2
votes
1 answer

Levenshtein Edit Distance is not calculating edit distance

I am trying to get my Levenshtein Edit Distance algorithm working but for some reason, the number of edits is coming out incorrect. I can't see where my mistake is and I was wondering if someone see's what I am doing…
Ryan Newman
  • 846
  • 3
  • 17
  • 35
2
votes
2 answers

How can I compare different rows of one column with Levenshtein distance metric in pandas?

I have a table like this: id name 1 gfh 2 bob 3 boby 4 hgf etc. I am wondering how can I use Levenshtein metric to compare different rows of my 'name' column? I already know that I can use this to compare columns: L.distance('Hello, Word!', 'Hallo,…
UserYmY
  • 8,034
  • 17
  • 57
  • 71
2
votes
2 answers

String distance metrics that is in favor of substring, and word order independent?

For my data analytics problem, I usually needs to regulate names, that names A, and B, I'd consider them the same or very similar, if A and B share substantial number of common substrings, regardless of the order of those substring. For example,…
Yu Shen
  • 2,770
  • 3
  • 33
  • 48
2
votes
1 answer

Asymmetric Levenshtein distance

Given two bit strings, x and y, with x longer than y, I'd like to compute a kind of asymmetric variant of the Levensthein distance between them. Starting with x, I'd like to know the minimum number of deletions and substitutions it takes to turn x…
2
votes
4 answers

Using base64 encoding as a mechanism to detect changes

Is it possible to detect changes in the base64 encoding of an object to detect the degree of changes in the object. Suppose I send a document attachment to several users and each makes changes to it and emails back to me, can I use the string…
Mikos
  • 8,455
  • 10
  • 41
  • 72
2
votes
2 answers

Available code to compute affine gap distance

Given the ubiquitous availability of code (in C, R, python, Java) which computes the Levenshtein edit distance, I am somewhat surprised at the lack of implementations of other edit distances such as the affine gap distance. Are there easily usable…
Markus Loecher
  • 367
  • 1
  • 16
2
votes
2 answers

Minimum edit distance of zig zag string

I have string like this xxoxxooo and I wanna edit it to this form xoxoxoxo, my question is how to find minimum number of swaps and I can only swap 2 neighbours as swap. I thought about going through the string and finding the closest redundant x and…
c0ntrol
  • 908
  • 2
  • 9
  • 14
2
votes
3 answers

Edit Distance with accents

Are there some edit-distance in python that take account of the accent. Where for exemple hold the following property d('ab', 'ac') > d('àb', 'ab') > 0
vigte
  • 95
  • 1
  • 8
2
votes
1 answer

Clustering string data with ELKI

I need to cluster a large number of strings using ELKI based on the Edit Distance / Levenshtein Distance. Since the data set is too large, I'd like to avoid file based precomputed distance matrices. How can I (a) load string data in ELKI from a file…
Stahli
  • 21
  • 2
1
vote
0 answers

Search engine string matching

What is the typical algorithm used by online search engines to make suggestions for misspelled words. I'm not necessarily talking about Google, but any site with a search feature, such as as Amazon.com for instance. Say I search for the word…
oym
  • 6,983
  • 16
  • 62
  • 88
1
vote
3 answers

Pseudocode for script to check transcription accuracy / edit distances

I need to write a script, probably in Ruby, that will take one block of text and compare a number of transcriptions of recordings of that text to the original to check for accuracy. If that's just completely confusing, I'll try explaining another…
GarlicFries
  • 8,095
  • 5
  • 36
  • 53