Using similarities.cosine (with dataset) of SurPRISE package python

Question

Briefing:

I'm working over Movielens 100k Dataset for recommendation of movies. So far I've done foll.

Sorting of values

df_sorted_values = df.sort_values(['UserID', 'MovieID']) print type(df_sorted_values)
Printing Matrix with NaN values

df_matrix = df.pivot_table(values='Rating', index='UserID', columns='MovieID')
Performed 5 Fold CV on it

reader = Reader(line_format="user item rating", sep='\t', rating_scale=(1,5)) df = Dataset.load_from_file('ml-100k/u.data', reader=reader) df.split(n_folds=5)
I've evaluated the dataset using SVD

perf = evaluate(SVD(),df,measures=['RMSE','MAE']) print_perf(perf)
HERE I NEED THE USE SIMILARITY ALGORITHM provided by same package (Surprise) which is written as surprise.cosine to Predict the missing values. This shows that it needs (*args,**kwargs) arguments but I'm clueless as what is actually to be passed.
ONCE THE SIMILARITIES ARE GENERATED I NEED TO PRINT THE MATRIX WITH REPLACED NaN values WHICH ARE NOW PREDICTED, later will be used for recommendation

P.S. I'm open to different solutions from CRAB, RECSYS, PANDAS and GRAPHLAB provided they can be worked out on steps 1 to 4 as well

My past references have been:

This Manual, but doesn't show on how the arguments have passed nor the example
This Which doesn't have much difference than first

score 0 · Answer 1 · answered Jul 03 '17 at 11:03

0

While computing the cosine similarity between 2 vectors is very easy (how about 1-np.dot(a,b)/(np.linalg.norm(a)*np.linalg.norm(b))

I would recommend you to Work with Scipy if you don't want to implement it yourself:

from scipy.spatial.distance import cosine

answered Jul 03 '17 at 11:03

Binyamin Even

3,318
1
18
45

I don't see how is it helpful for huge dataset like movie lens 100k, yet I have now found the other way round in same package. Greatful for your reply here. – T3J45 Jul 03 '17 at 11:07
if you can shed some light on how can I export dataset in Surprise package after applying algorithm, that'd be helpful! – T3J45 Jul 03 '17 at 11:09

score 0 · Answer 2 · answered Sep 12 '17 at 15:28

Those similarity functions are used like these docs: Using prediction algorithms , FAQ, The algorithm base class - compute_similarities for KNN-based algos. They are not supposed to be used like what you want to.

You may want to use the predict function if you choose to use SVD algorithm The algorithm base class - predict like:

# Build an algorithm, and train it.
algo = SVD()
algo.train(trainset)
uid = str(196)  # raw user id  
iid = str(302)  # raw item id  
# get a prediction for specific users and items.
pred = algo.predict(uid, iid)

Using similarities.cosine (with dataset) of SurPRISE package python

Briefing:

2 Answers2