In a Pandas dataframe, I need to remove entries that are too close with respect to Levenshtein distance. An inefficient implementation is:
i = 0
j = 0
for index, row in df.iterrows():
text1 = row['text']
for index2, row2 in df.iterrows():
text2 = row2['text']
lev_ratio = Levenshtein.ratio(text1, text2)
if j != i and lev_ratio > 0.9:
df.drop(index2, inplace = True)
j += 1
i += 1
Is there a more efficient way ?