I have a list of strings and I want to filter out the strings that are too similar based on levenstein distance. So if lev(list[0], list[10]) < 50
; then del list[10]
. Is there any way I can calculate such distance between every pair of strings in the list, more efficiently?? Thanks!!
data2= []
for i in data:
for index, j in enumerate(data):
s = levenshtein(i, j)
if s < 50:
del data[index]
data2.append(i)
The rather dumb code above is taking too long to compute...