1

I'm relatively new to programming and while doing my university assignment, I've been running into problems with the process.extract() function from the fuzzywuzzy package.

The documentation says the function should return a list, however, my code returns an object with a class type of None.

If I print the process.extract(limit=1) it returns what looks like a list:

[("Matching string","Fuzz Ratio", "Index")]

However I can't slice this object using [] because it is classified as None.

I need to be able to get the index of the matching string so either I need to find out why process.extract() isn't returning a list or how to get the last element out of the None object.

Thanks and if any clarification is required, let me know.

EDIT: Ok my bad guys I'll try to be more specific.

Minimal reproducable sample:

Note: CSV file is simply two columns of questions and answers, it's from my university so I'm not sure if I'm allowed to upload the exact file, sorry about this.

Ok so weirdly, recreating the example fixed it but I can't tell what is happening still.

The first example is taken from my original code:

import pandas as pd 
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
df_chat_bot = pd.read_csv("ChatBot-short.csv")
str_user_question = input("Question: ")
test_var = print(process.extract(str_user_question, df_chat_bot["Question"], limit = 1))
print(type(test_var))

Output of the print(type(test_var)) line is (direct from terminal) "<class 'NoneType'>".

Then, when I went to make the reproduceable bit of code:

from fuzzywuzzy import process
from fuzzywuzzy import fuzz
import pandas as pd  
test_user_question = "vsc"
df_chat_bot = pd.read_csv("ChatBot-short.csv")
temp_var = process.extract(test_user_question,df_chat_bot["Question"], limit = 1)
print(temp_var)
print(type(temp_var))

printing this type results in "<class 'List'>"

So what's going on here? Note that I commented out all other code from my original file so the code I pasted here is the only code being run.

smci
  • 32,567
  • 20
  • 113
  • 146
Adreto
  • 11
  • 5
  • Could you please [edit] to provide a [mre]? – tripleee Jun 11 '21 at 04:09
  • 1
    "A class type of `None`" makes no sense - `NoneType` is at least possible, but such an object could not have a string representation other than "None". Please show actual code/transcript that exhibits this `None` type you're talking about. – jasonharper Jun 11 '21 at 04:09
  • To be specific, that's a list with one element, which is a tuple of 3 elements. – Tim Roberts Jun 11 '21 at 04:21

1 Answers1

2

See the difference:

Your code:

test_var = print(process.extract(str_user_question, df_chat_bot["Question"], limit = 1))

Test code:

temp_var =       process.extract(test_user_question,df_chat_bot["Question"], limit = 1)

In your code, the process.extract() is wrapped in print(). The function print() always returns None and assigns that to test_var, which is why you're getting an error. Of course type(None) is NoneType.

If you want to both assign and print the return value of process.extract(...), you have to do them in separate statements, as is done in the second example.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • I see! Thank you! I hope you don't mind answering but now that I understand process.extract() better (it returns a list of 3 tuples, instead of a list of 3 elements), how can I select the tuple that I want to use? I tried temp_var[-1] but it just returns the set of 3 tuples because it is under the first element of the list. I tried to google how to do this to no success. Thank you in advance if you do help me out :) – Adreto Jun 11 '21 at 05:25
  • It looks like `process.extract()` returns a list of tuples, where each tuple represents `(value, score[, key])` for a single item in the `choices`. The output will have the same number of tuples in it as the `choices` parameter has elements, and they'll correspond. To access the value of the last match you'd do `temp_var[-1][0]`. To get the value of the single tuple with the best score, you'd do `max(temp_var, key=lambda tup:tup[1])[0]`. Note that `process.extractOne()` will essentially do the `max()` part for you, returning just one tuple rather than a list of them. – Green Cloak Guy Jun 11 '21 at 05:31