According to Wikipedia the Levenshtein distance can be calculated using a following piece of pseudocode.
int LevenshteinDistance(string s, string t)
{
int len_s = length(s), len_t = length(t), cost = 0;
if (s[0] != t[0])
cost = 1;
if (len_s == 0)
return len_t;
else if (len_t == 0)
return len_s;
else
return minimum(
LevenshteinDistance(s[1..len_s], t) + 1,
LevenshteinDistance(s, t[1..len_t]) + 1,
LevenshteinDistance(s[1..len_s], t[1..len_t]) + cost);
}
If I understand your requirement correctly you want differences at the beginning of the collection to be more significant than differences towards the end. Let's adapt this recursive function to reflect this demand.
float LevenshteinDistance(string s, string t, float decay)
{
int len_s = length(s), len_t = length(t), cost = 0;
if (s[0] != t[0])
cost = 1;
if (len_s == 0)
return len_t;
else if (len_t == 0)
return len_s;
else
return decay * minimum(
LevenshteinDistance(s[1..len_s], t, decay) + 1,
LevenshteinDistance(s, t[1..len_t], decay) + 1,
LevenshteinDistance(s[1..len_s], t[1..len_t], decay) + cost);
}
When decay
is a parameter belonging to the interval (0,1) differences on larger indices become less significant than differences on previous ones.
Here's an example for decay=0.9
.
s t dist
"1234" "1234" 0.0000
"1234" "1243" 1.3851
"1234" "2134" 1.6290