I have a dataframe that lists objects and their different qualities in columns. One of those columns is the objects' colors. My goal is to write a function that creates a NEW column listing the fuzzy partial_ratio scores between the color of a SINGLE object (i.e., 'orange'
) to all the other objects' colors (i.e., "navy blue"
, "white"
, "red-orange"
).
Firstly, I have a function that searches the dataframe to find the color of the object. Its title is findcolor(object)
. If I call findcolor(pumpkin)
it searches the row 'pumpkin'
and column 'Color'
and returns the string "orange"
(which is in that cell). I am calling this function inside another function, below, which allows me to compare two objects' colors inside the dataframe.
def getsingleFuzzyScore(object1,object2):
x = findcolor(object2)
b = findcolor(object1)
if b in x:
return 100
elif x in b:
return 100
else:
return(fuzz.partial_ratio(b,x))
So, essentially if any of object1
's color is contained in object2
's color and vice versa, my score will be 100, otherwise, it will take the partial fuzzy ratio of the two colors in comparison. This information-- that is, my rendition of FuzzyScore-- I want as a new column in the dataframe.
Where I am struggling is inputting the 'Color'
column from the dataframe. This question (Comparing a single string to an array of strings in C) is what I am looking to do (but in python), and I would like to be able to call the object's column, i.e., 'Color'
and have the color of pumpkin be compared to EACH of the remaining objects in the 'Color'
column.
In conclusion, I would like to see the output be a column of numbers that are the outputs of getsingleFuzzScore
for EACH object compared to the color of the object I input.