Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
3
votes
3 answers

How to return names"with different spelling" from dataframe

As you know, A lot of names have multiple spellings. I have a dataset that have first and last names, But i have an issue with spelling variations. Here is a sample from the dataset : firstName lastName 0 Ali Khaled 1 Hamada …
3
votes
2 answers

How to find that two words differ by how much distance>> Is there any shortest way for this

I have read about Levenshtein distance about the calculation of the distance between the two distinct words. I have one source string and i have to match it with all 10,000 target words. The closest word should be returned. The problem is I have…
AGeek
  • 5,165
  • 16
  • 56
  • 72
3
votes
1 answer

Add counter and distance to dictionary

Hello I have a specific string and I am trying to calculate its distance using edit distance and I want to see the number of counts of the string that occurs and then sort it. str= "Hello" and a txt file named- xfile I am comparing with…
girlwhocodes
  • 188
  • 6
3
votes
1 answer

Compute Edit distance for a dataframe which has only column and multiple rows in python

I have a dataframe which has one column and more that 2000 rows. How to compute the edit distance between each rows of the same column. My Dataframe looks like this: Name John Mrinmayee rituja ritz divya priyanka chetna chetan …
Sayli Jawale
  • 159
  • 1
  • 18
3
votes
0 answers

Block edit distance with Swapping only

Suppose I have distinct alphabets ∑={a1,a2,...,an}. I also have two permutations of these alphabets, let's call them A,B. How can I find the Edit distance between A and B with block edit operations allowed? To make it clearer, an example would be…
AspiringMat
  • 2,161
  • 2
  • 21
  • 33
3
votes
1 answer

Percentage edit distance from array

I am attempting to get a percentage of an edit distance from a group of sequences. So far this is what I have: #!/usr/bin/perl -w use strict; use Text::Levenshtein qw(distance); my @sequence = qw(CA--------W----------------------EKDRRTEAF---F------…
El David
  • 375
  • 2
  • 3
  • 11
3
votes
1 answer

Edit Distance Matrix

I'm trying to build a program that takes two strings and fills in the edit distance matrix for them. The thing that is tripping me up is, for the second string input, it is skipping over the second input. I've tried clearing the buffer with getch(),…
Dylan Forsyth
  • 57
  • 1
  • 7
3
votes
2 answers

Levenshtein distance with non uniform cost for insertions and substitutions:

I have been trying to implement a levenshtein distance function in C++ that gives different weights to substitutions and insertions based on which characters are being replaced or inserted. The cost is calculated based on the distance of the keys…
KaziJehangir
  • 295
  • 1
  • 3
  • 9
3
votes
0 answers

Algorithm to find all substrings of a string within a given edit distance of another string

I know the title is a bit messy, so let me explain in detail: I have two strings, T and P. T represents the text to be searched, and P represents the pattern to be searched for. I want to find ALL substrings of T which are within a given edit…
user129186
  • 1,156
  • 2
  • 14
  • 30
3
votes
1 answer

How to correct bugs in this Damerau-Levenshtein implementation?

I'm back with another longish question. Having experimented with a number of Python-based Damerau-Levenshtein edit distance implementations, I finally found the one listed below as editdistance_reference(). It seems to deliver correct results and…
flow
  • 3,624
  • 36
  • 48
3
votes
2 answers

Edit distance algorithm explanation

According to wikipedia, the definition of the recursive formula which calculates the Levenshtein distance between two strings a and b is the following: I don't understand why we don't take into consideration the cases in which we delete a[j], or we…
rondo
  • 63
  • 8
3
votes
2 answers

Explanation of normalized edit distance formula

Based on this paper: IEEE TRANSACTIONS ON PAITERN ANALYSIS : Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance as followed: Given two strings X and Y over a finite alphabet, the normalized edit …
jxn
  • 7,685
  • 28
  • 90
  • 172
3
votes
2 answers

levenshtein matrix cell calculation

I do not understand how the values in the levenshtein matrix is calculated According to this article. I do know how we arrive at the edit distance of 3. Could someone explain in lay man terms how we arrive at each value in each cell?
jxn
  • 7,685
  • 28
  • 90
  • 172
3
votes
1 answer

Optimizing Levenshtein distance algorithm

I have a stored procedure that uses Levenshtein distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein distance for all the records before selecting…
Matt
  • 5,547
  • 23
  • 82
  • 121
3
votes
2 answers

Determining a sequence of edits that produces the Levenshtein distance

I am doing some work using Levenshtein (edit) distance using dynamic programming. I think I understand the Wagner-Fischer algorithm to do this efficiently. However, it doesn't look like the algorithm is constructive. If I compute that the edit…