0

I have a dataframe (df) and trying to append data to a specific row

Index Fruit Rank 0 banana 1 1 apple 2 2 mango 3 3 Melon 4

The goal is to compare the Fruit at Rank 1 to each rank and then append the value. I'm using difflib.SequenceMatcher to make the comparison. Right now i'm able to append to df but i end up appending the same value to each row. I'm struggling with the loop and append. Any pointers would be much appreciated.

Here is some of my code:

new_entry = df[(df.Rank ==1)]
new_fruit = new_entry['Fruit']

prev_entry = df[(df.Rank ==2)]
prev_fruit = prev_entry['Fruit']


similarity_score = difflib.SequenceMatcher(None, str(new_fruit).lower(), str(prev_fruit).lower()).ratio()

df['similarity_score'] = similarity_score

The result is something like this:

Index Fruit Rank similarity_score 0 banana 1 0.3 1 apple 2 0.3 2 mango 3 0.3 3 Melon 4 0.3

The desired result is:

Index Fruit Rank similarity_score 0 banana 1 n/a 1 apple 2 0.4 2 mango 3 0.5 3 Melon 4 0.6

Thanks.

Marcin
  • 48,559
  • 18
  • 128
  • 201
BlackHat
  • 736
  • 1
  • 10
  • 24

1 Answers1

1

This doesn't give the similarity score order you want, but it calculates the SequenceMatcher ratio to the rank 1 value ('banana') and each row and adds it as a column.

import pandas as pd
import difflib

df = pd.DataFrame({'Fruit': ['banana', 'apple', 'mango', 'melon'],
                   'Rank': [1, 2, 3, 4]})

top = df['Fruit'][df.Rank == 1][0]
df['similarity_score'] = df['Fruit'].apply(lambda x: difflib.SequenceMatcher(
                                           None, top, x).ratio())
bananafish
  • 2,877
  • 20
  • 29