My dataframe:
a = [['a', 'q1', False], ['a', 'q1', False], ['a', 'q1', False], ['b', 'q1', True], ['b', 'q2', True], ['c', 'q1', True], ['c', 'q2', True], ['c', 'q3', False], ['c', 'q4', False], ['d', 'q1', False], ['e', 'q1', True], ['e', 'q2', True], ['e', 'q3', True], ['e', 'q4', True], ['e', 'q5', True], ['f', 'q3', True], ['f', 'q4', True], ['f', 'q5', True], ['f', 'q6', True], ['g', 'q1', True], ['g', 'q2', True]]
df_testing = pd.DataFrame(a, columns=['user', 'question', 'Truth'])
I want to build a recommender system using surprise
. I want to recommend questions for, say, user f
.
from surprise import NMF, SVD, SVDpp, KNNBasic, KNNWithMeans, KNNWithZScore, CoClustering
from surprise.model_selection import cross_validate
from surprise import Reader, Dataset
d = {
'True': True,
'False': False
} # to convert to boolean
df_testing.Truth = df_testing.Truth.astype(int)
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df_testing, reader)
unique_qs = df_testing['question'].unique()
my_user = df_testing.loc[df_testing['user']=='f', 'question']
qs_to_predict = np.setdiff1d(unique_qs,my_user)
algo = NMF()
algo.fit(data.build_full_trainset())
my_qs = []
for _id in qs_to_predict:
my_qs.append((_id, algo.predict(uId='f',_id=_id).est))
pd.DataFrame(my_recs, columns=['_id', 'predictions']).sort_values('predictions', ascending=False)
I get:
ZeroDivisionError Traceback (most recent call last)
<ipython-input-74-19f400187aed> in <module>
1 algo = NMF()
----> 2 algo.fit(data.build_full_trainset())
3 my_qs = []
4 for _id in qs_to_predict:
5 my_qs.append((_id, algo.predict(uId='f',_id=_id).est))
~\anaconda3\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.NMF.fit()
~\anaconda3\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.NMF.sgd()
ZeroDivisionError: float division
I assume it is because of the binary rating scale, but in this case I can't just discard the zeroes. I tried a non-binary classification, with ratings from 1 to 3, and I get the same error. Is there a way to avoid this error?