-3

I have a pandas data frame with two columns containing strings, like below:

Col-1                 Col-2
Animal                have an apple
Fruit                 tiger safari
Veg                   Vegetable Market
Flower                Garden

From this i have to create a function which takes a string as argument.

This function then checks the fuzziwuzzy similarity between the input string and the elements of Col-2 and outputs the elements of Col-1 and Col-2 corresponding of the highest computed similarity.

For instance suppose input string is Gardening Hobby, here it will check similarity with all the elements of df['Col-2']. The function finds this ways that Garden as the highest similarity with Gardening Hobby with a score of 90. Then Expected output is:

I/P               O/P
Gardening Hobby   Garden(60),Flower
ysearka
  • 3,805
  • 5
  • 20
  • 41
ssp
  • 71
  • 1
  • 1
  • 5
  • Your problem isn't really clear. What are the `count` you are talking about? Why does the string `BOTH` should throw `Error in Message`? What kind of similarity are you trying to compute? What do you do with the similarity computed between your input string and elements of `Col-2`? How do you get `Garden(60),Flower` as output in your example? Please make your issue clearer if you want an answer. – ysearka Aug 23 '18 at 12:04
  • @ysearka..I have edited my question.. Hope it is fine now. – ssp Aug 23 '18 at 12:27
  • Not just yet, in your example you check the similarity between your input string and elements of `df['Col-2']`, but in your previous paragraph you say that you also need to compute the similarity with the first column? And what is the `60` in your output? The similarity score? If so, how is it computed? – ysearka Aug 23 '18 at 12:35
  • @ysearka..60 is the similarity score. We have to compute using fuzzywuzzy logic.I have not computed , just an example. I have to check the similarity of the input string with the Col-2 value only. – ssp Aug 23 '18 at 12:44
  • 1
    The solution of gyx-hh seems to be working, please consider accepting it if it solves your issue. – ysearka Aug 23 '18 at 14:05

1 Answers1

1

Try the following approach using the fuzzywuzzy library - tutorial

from fuzzywuzzy import process

search_str = 'Gardening Hobby'
# extract the best match of search_str in df['Col-2']
best_match = process.extractOne(search_str, df['Col-2'])
print(best_match)  # output: ('Garden', 90, 3)  (match,score,index)

# get results for 'Col-1' using the index
res = df.iloc[best_match[2]]['Col-1']
print(res)  # output: 'Flower'

# construct the output string as you wish
'%s(%d), %s' % (best_match[0], best_match[1], res)

# output: 'Garden(90), Flower'
gyx-hh
  • 1,421
  • 1
  • 10
  • 15