0

My dataframe:

a = [['a', 'q1', False], ['a', 'q1', False], ['a', 'q1', False], ['b', 'q1', True], ['b', 'q2', True], ['c', 'q1', True], ['c', 'q2', True], ['c', 'q3', False], ['c', 'q4', False], ['d', 'q1', False], ['e', 'q1', True], ['e', 'q2', True], ['e', 'q3', True], ['e', 'q4', True], ['e', 'q5', True], ['f', 'q3', True], ['f', 'q4', True], ['f', 'q5', True], ['f', 'q6', True], ['g', 'q1', True], ['g', 'q2', True]]
df_testing = pd.DataFrame(a, columns=['user', 'question', 'Truth'])

I want to build a recommender system using surprise. I want to recommend questions for, say, user f.

from surprise import NMF, SVD, SVDpp, KNNBasic, KNNWithMeans, KNNWithZScore, CoClustering
from surprise.model_selection import cross_validate
from surprise import Reader, Dataset
d = {
  'True': True,
  'False': False
} # to convert to boolean

df_testing.Truth = df_testing.Truth.astype(int)

reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df_testing, reader)

unique_qs = df_testing['question'].unique()
my_user = df_testing.loc[df_testing['user']=='f', 'question']
qs_to_predict = np.setdiff1d(unique_qs,my_user)

algo = NMF()
algo.fit(data.build_full_trainset())
my_qs = []
for _id in qs_to_predict:
    my_qs.append((_id, algo.predict(uId='f',_id=_id).est))
    
pd.DataFrame(my_recs, columns=['_id', 'predictions']).sort_values('predictions', ascending=False)

I get:

ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-74-19f400187aed> in <module>
      1 algo = NMF()
----> 2 algo.fit(data.build_full_trainset())
      3 my_qs = []
      4 for _id in qs_to_predict:
      5     my_qs.append((_id, algo.predict(uId='f',_id=_id).est))

~\anaconda3\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.NMF.fit()

~\anaconda3\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.NMF.sgd()

ZeroDivisionError: float division

I assume it is because of the binary rating scale, but in this case I can't just discard the zeroes. I tried a non-binary classification, with ratings from 1 to 3, and I get the same error. Is there a way to avoid this error?

futuredataengineer
  • 442
  • 1
  • 3
  • 14
  • In this line `algo.fit(data.build_full_trainset())` - what is `data`? I don't see that variable defined in this code example. Should this be `data_testing`? – PeptideWitch Nov 09 '21 at 05:51
  • @PeptideWitch, I am sorry, I corrected. `data = Dataset.load_from_df(df_testing, reader)`. – futuredataengineer Nov 09 '21 at 05:57
  • @PeptideWitch I tried a non-binary classification, with ratings from 1 to 3, and I get the same error. – futuredataengineer Nov 09 '21 at 07:51
  • [After reading a similar error here](https://stackoverflow.com/questions/60151847/surprise-nmf-throws-zerodivisionerror-float-division), I think there's an issue with the way you have structured your input data. I can't validate this because I'm having troulbe installing `surprise` on my system, but maybe your user & question fields need to have at least 1 of each type. Can you try removing the 2x redundant `['a', 'q1', False]` from your example `a` and see if that works. If not, it may be an issue with not every user having a valid response for each question asked (i.e. missing vals) – PeptideWitch Nov 10 '21 at 00:50

0 Answers0