2

I have a column 'names' and 'ids' in table "OG" and want to find those names where the last letter is different and the total edit distance is two. So far I have:

SELECT
z1.names as names1, z2.names as names2, z1.ids, z2.ids
FROM (SELECT t.names, SUBSTRING(t.names for Length(t.names-1) AS newnames
from "OG" t) z1, (SELECT r.names, SUBSTRING(r.names for Length(r.names-1) AS
newnames1 FROM "OG" r) z2
WHERE levenshtein(z1.newnames, z2.newnames1) = 2 AND z1.id != z2.id

Unfortunetly, this doesn't ensure the last letters are different. Any ideas for a fix?

1 Answers1

2

Check the last characters as well:

WHERE levenshtein(z1.newnames, z2.newnames1) = 2 AND z1.id != z2.id
AND substring(z1.names,Length(z1.names)) <> substring(z2.names,Length(z2.names))

Note that using SUBSTRING(t.names for length(t.names)-1) in your query will fail when the string is empty (not null)

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262