0

Briefing:

I'm working over Movielens 100k Dataset for recommendation of movies. So far I've done foll.

  1. Sorting of values

    df_sorted_values = df.sort_values(['UserID', 'MovieID']) print type(df_sorted_values)

  2. Printing Matrix with NaN values

    df_matrix = df.pivot_table(values='Rating', index='UserID', columns='MovieID')

  3. Performed 5 Fold CV on it

    reader = Reader(line_format="user item rating", sep='\t', rating_scale=(1,5)) df = Dataset.load_from_file('ml-100k/u.data', reader=reader) df.split(n_folds=5)

  4. I've evaluated the dataset using SVD

    perf = evaluate(SVD(),df,measures=['RMSE','MAE']) print_perf(perf)

  5. HERE I NEED THE USE SIMILARITY ALGORITHM provided by same package (Surprise) which is written as surprise.cosine to Predict the missing values. This shows that it needs (*args,**kwargs) arguments but I'm clueless as what is actually to be passed.

  6. ONCE THE SIMILARITIES ARE GENERATED I NEED TO PRINT THE MATRIX WITH REPLACED NaN values WHICH ARE NOW PREDICTED, later will be used for recommendation

P.S. I'm open to different solutions from CRAB, RECSYS, PANDAS and GRAPHLAB provided they can be worked out on steps 1 to 4 as well

My past references have been:

  1. This Manual, but doesn't show on how the arguments have passed nor the example
  2. This Which doesn't have much difference than first
T3J45
  • 717
  • 3
  • 12
  • 32

2 Answers2

0

While computing the cosine similarity between 2 vectors is very easy (how about 1-np.dot(a,b)/(np.linalg.norm(a)*np.linalg.norm(b))

I would recommend you to Work with Scipy if you don't want to implement it yourself:

from scipy.spatial.distance import cosine

Binyamin Even
  • 3,318
  • 1
  • 18
  • 45
  • I don't see how is it helpful for huge dataset like movie lens 100k, yet I have now found the other way round in same package. Greatful for your reply here. – T3J45 Jul 03 '17 at 11:07
  • if you can shed some light on how can I export dataset in Surprise package after applying algorithm, that'd be helpful! – T3J45 Jul 03 '17 at 11:09
0

Those similarity functions are used like these docs: Using prediction algorithms , FAQ, The algorithm base class - compute_similarities for KNN-based algos. They are not supposed to be used like what you want to.

You may want to use the predict function if you choose to use SVD algorithm The algorithm base class - predict like:

# Build an algorithm, and train it.
algo = SVD()
algo.train(trainset)
uid = str(196)  # raw user id  
iid = str(302)  # raw item id  
# get a prediction for specific users and items.
pred = algo.predict(uid, iid)
xgdgsc
  • 1,367
  • 13
  • 38