3

I'm constructing a recommender system which use Item-based collaborative filtering. But I have a problem with the predict function I don't know which function can be used when calculating similarities between different items (Movies) by using Tanimoto Coefficient (Jaccard similarity coefficient)?. the following example can explain my problem. Let us assume that User1 watched movie 1 and when we calculated tanimoto coefficient between movie 1 and all other movies, we found top-5 similar movies were 527,595,608,1097 and 588 .where each of these movie has it is own similarity with movie 1 as follow:

User :1

Watched Movie---Similar Movie----Tanimoto Coefficient score
          1--------527-------- = 0.33242
          1--------595-------- = 0.3377
          1--------608-------- = 0.3523
          1--------1097-------- = 0.3619
          1--------588-------- = 0.42595

So what is the next step after calculating similarities? please I need help with this.

PS: I found all top-5 (527,595,608,1097 and 588) was watched by user 1 , so they can not be considered as a recommended movie.

Many thanks

easy
  • 65
  • 5

1 Answers1

2

First, in both methods - user-to-user and item-to-item we have defined two functions: similarity and predict. The similarity measures for You how close are two entities to each other (users or items). In Your case - Tanimoto was chosen. What You are missing is predict function. As You have got the nearest entities (in i2i - items) You have to predict the rating value (or in implicit user feedback - does something happen). The simplest form is to use weighted average function, where weight is the similarity measure:

enter image description here

An average should be calculated only for items unrated by the user. This is one of the simplest recommendation generation for particular user using item2item.

Quick example. Having rating matrix like R:

We trying to predict rating for user 1 and item 1. Tanimoto similarity measure is used in below calculation.

enter image description here

So we will predict that user 1 gives item 1 rating: 4/5.

For the performance reason we keeping indexed most Top-N similar items, but still those items should be new to the user for which recommendation will be generated.

  • Dear @BartłomiejTwardowski thank you very much for your valuable answer. I have two questions about the prediction equation that you have posted. 1) `r_u_j` (in the numerator) is it represent the average of rates for recommended item from the other users (in case of we consider item-item CF)? 2) if answer yes. So, in case if we consider the top-5 and we apply the equation you mentioned, so the predict score that will be returned for which item of the top-5 (i.e. 527,595,608,1097 and 588)? this example will explain my second question. ...continue – easy May 24 '15 at 16:42
  • For instance, user 1 watched item (1) in the example above if we apply the equation and consider `r_u_j` as average rate for item 527 = 4.40, 595 = 3.2, 608=2.0, 1068=4.40 and 588=3.5 . So if we apply the equation it will be ((0.33242 + 4.40)/ 0.33242) +((0.3377 + 3.2)/0.3377) + ((0.3523 + 2.0)/0.3523) +((0.3619+4.40)/0.3619)+((0.42595+3.5)/0.42595) . Which equal : 14.2362674 + 10.771987 + 6.67697985 + 13.1580547 + 9.21692687 = 54.06021582 So, what is the item (i.e. 527,595,608,1097 and 588) that this number should be predict score for which item of the top-5? ..continue – easy May 24 '15 at 16:43
  • And this item is not in range of 1-5, so how can we make it between 1-5 score? – easy May 24 '15 at 16:43
  • Ok. For the first question: **r(u,j)** - is actual rating of item **j** given by user **u**. It does not represent average! It's single user-item rating. – Bartłomiej Twardowski May 26 '15 at 13:05
  • Next, for given example I cannot event help You with calculation because I does not have **r(u,j)**. Whats more - there is an error using weighted avg. Look here: (http://upload.wikimedia.org/math/7/5/c/75c5cdcd4fd787649fe5cd279de40ef2.png). Applied correctly should result in 1-5. And once again - You taking into account items unrated by the user. – Bartłomiej Twardowski May 26 '15 at 13:13
  • Dear @Bartłomiej Twardowski thank you for your response. For my first question you answer was the rating should not be average, but item j has not been rated by user **U** , it has been rated by other users for this purpose I considered the average. So, shall I put rate same as rate that user 1 (**U**) gave to item (1), or what should I do? – easy May 26 '15 at 16:17
  • for my second question I have applied the equation that mentioned [ https://chart.googleapis.com/chart?cht=tx&chl=predict%28u%2Ci%29%3D%5Cfrac%7B%5Csum_%7Bj+%5Cin+NotRatedItems%28u%29%7Dsim%28i%2Cj%29r_%7Buj%7D%7D%7B%5Csum_%7Bj+%5Cin+NotRatedItems_%28u%29%7Dsim%28i%2Cj%29%7D] , where I have applied the sum of tanimoto-coefficient similarity multiplied by rate and divided on the absoulte value of the sums of tanimoto-coefficient similarity. what should I do ? I'm really stuck I don't know what to do with this? – easy May 26 '15 at 16:22
  • @easy sorry, having it read once again I found out that there was and error in the equation generated from URL. To make things clear I placed image directly. For Your example from first question it's hard to give the answer - because : (1) only one movie was watched by user 1 - so taking weighted avg. does not make sense, (2) there is no rating of user 1 provided. So I will edit and add a simple example to my answer. – Bartłomiej Twardowski Jun 01 '15 at 11:47
  • Dear @Bartłomiej Twardowski thank you very much for your valuable information. Now, it becomes more clear for me I will apply this solution and let you know about the results. Many thanks for your help, it is highly appreciated. – easy Jun 01 '15 at 17:37