Compare each row in column with every row in the same column and remove the row if match ratio is > 90 with fuzzy logic in python. I tried removing using duplicates, but there are some rows with same content with some extra information. The data is like below
print(df)
Output is :
Page no
0 Hello
2 Hey
3 Helloo
4 Heyy
5 Hellooo
I'm trying to compare each row with every row and remove if row matches the content with ratio greater than 90 using fuzzy logic. The expected output is :
Page no
0 Hello
2 Hey
The code i tried is :
def func(name):
matches = df.apply(lambda row: (fuzz.ratio(row['Content'], name) >= 90), axis=1)
print(matches)
return [i for i, x in enumerate(matches) if x]
func("Hey")
The above code only checks for one row with sentence Hey
Can anyone please help me with code? It would be really helpful