How can I create columns that show the respectively similarity indices for each row?
This code
def func(name):
matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
return [try_test.word[i] for i, x in enumerate(matches) if x]
try_test.apply(lambda row: func(row['name']), axis=1)
returns indices that match the condition >=85
. However, I would be interested also in having the values by comparing each field to all others.
The dataset is
try_test = pd.DataFrame({'word': ['apple', 'orange', 'diet', 'energy', 'fire', 'cake'],
'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})
Help with be very appreciated.
Expected output (values are just an example)
word name sim_index1 sim_index2 sim_index3 ...index 6
apple dog 100 0
orange cat 100
... mad cat 0.6 100
On the diagonal there is a value of 100 as I am comparing dog with dog,... I might consider also another approach if you think it would be better.