Questions tagged [edit-distance]

A string metric describing the differences between two strings. More specifically, it is the number of operations that transform one string into another string. Operations include the insertion, deletion, substitution, or transposition of a character in the string. Operations can be considered in combinations and may have different costs.

References

Edit distance (Wikipedia)

256 questions
4
votes
1 answer

networkx: how to set custom cost function?

I am following networkx documentation (1) and I would like to set different penalties for cost function (e.g. node_del_cost and node_ins_cost). Let say, I would like to penalize deletion/insertion of node by three points. So far, I have created two…
Olha Kholod
  • 539
  • 1
  • 5
  • 11
4
votes
1 answer

Levenshtein distance from index 0

I've been working through "The Algorithm Design Manual" section 8.2.1 Edit Distance by Recursion. In this section Skiena writes, "We can define a recursive algorithm using the observation that the last character in the string must either be matched,…
4
votes
2 answers

Token-based edit distance in Python?

I'm familiar with python's nltk.metrics.distance module, which is commonly used to compute edit distance of two string. I am interested in a function which computes such distance but not char-wise as normally but token-wise. By that I mean that you…
petrbel
  • 2,428
  • 5
  • 29
  • 49
4
votes
1 answer

Fast approximate string difference for large strings

I'm trying to quantify the difference between two strings as part of a change-monitor system. The issue I'm having is that the strings are large - I can often be dealing with strings with 100K+ characters. I'm currently using Levenshtein distance,…
Fake Name
  • 5,556
  • 5
  • 44
  • 66
4
votes
1 answer

Faster edit distance algorithm

Problem: I know the trivial edit distance DP formulation and computation in O(mn) for 2 strings of size n and m respectively. But I recently came to know that if we only need to calculate the minimum value of edit distance f and it is bounded…
v78
  • 2,803
  • 21
  • 44
4
votes
4 answers

Levenshtein distance with weight/penalty for adjacency

I am using the string-edit distance (Levenshtein-distance) to compare scan paths from an eye tracking experiment. (Right now I am using the stringdist package in R) Basically the letters of the strings refer to (gaze) position in a 6x4 matrix. The…
4
votes
1 answer

Complexity of edit distance (Levenshtein distance) recursion top down implementation

I have been working all day with a problem which I can't seem to get a handle on. The task is to show that a recursive implementation of edit distance has the time complexity Ω(2max(n,m)) where n & m are the length of the words being measured. The…
4
votes
3 answers

Generating a list of distinct (distant, by edit distance) words by filtering

I have a long (> 1000 items) list of words, from which I would like to remove words that are "too similar" to other words, until the remaining words are all "significantly different". For example, so that no two words are within an edit distance…
andrew cooke
  • 45,717
  • 10
  • 93
  • 143
4
votes
3 answers

Edit distance explanation

I have seen a lot of code to solve that but I am not able to see why they are using a matrix to represent the distance between two words. Can any one please explain to me? Here is a sample code I found: public static int minDistance(String word1,…
Srujan Kumar Gulla
  • 5,721
  • 9
  • 48
  • 78
4
votes
2 answers

How to find all strings at a given edit distance from a given string

We all have seen in Google, that if we type a query, and make a typo, Google suggests a saner version of the query (which is correct more often than not). Now how do they do it? One possible way I can think of is find out all other strings at an…
SexyBeast
  • 7,913
  • 28
  • 108
  • 196
4
votes
5 answers

How to determine differences in two lists of data

This is an exercise for the CS guys to shine with the theory. Imagine you have 2 containers with elements. Folders, URLs, Files, Strings, it really doesn't matter. What is AN algorithm to calculate the added and the removed? Notice: If there are…
Gustavo Carreno
  • 9,499
  • 13
  • 45
  • 76
3
votes
1 answer

solr fuzzy search with edit distance above 1

Enviornment- java version "11.0.12" 2021-07-20 LTS, solr-8.9.0 I have the following field declaration for my Solr index:
user595014
  • 114
  • 3
  • 8
  • 20
3
votes
3 answers

Select similar sentences

If I have a set of sentences and I would like to extract the duplicates, I should work like in the following example: sentences<-c("So there I was at the mercy of three monstrous trolls", "Today is my One Hundred and Eleventh birthday", …
Mark
  • 1,577
  • 16
  • 43
3
votes
0 answers

Is it possible to perform clustering on asymmetrical cost matrix

I have generated cost matrix from graph edit distance algorithm. Every entry (e.g 'TCGA-05-4420') corresponds to a specific graph. import numpy as np import pandas as pd import matplotlib.pyplot as plt d = {'TCGA-05-4420': pd.Series([0, 907, 866,…
Olha Kholod
  • 539
  • 1
  • 5
  • 11
3
votes
2 answers

How to fit strings using spaces, minimizing edit distance?

I'm looking for an algorithm that fits two strings, filling them up with spaces if necessary to minimize edit distance between them: fit('algorithm', 'lgrthm') == ' lg r thm' There sure must be some prewritten algorithm for this. Any ideas?
Barney Szabolcs
  • 11,846
  • 12
  • 66
  • 91