I am designing an application that incorporates a recommendation system base on user interactions (collaborative filtering). The user on his homepage is presented a set of 6 items to interact with. There will be between 50 and 300 items. The following actions are possible:
- click on an item (strong interest)
- refresh an item (some interest)
- open a read-more dialog (some interest)
- don't do anything an move on (no interest)
This data is collected and stored. The system should recommend items of interest to the user. I'am thinking about turning this data into a rating system.
Option A) if the user clicks on an item, this is translated into a implicit lifetime rating of 5. refreshing an item it a 4 and so on. So my user->item matrix would look like this:
item 1 | item 2 | item 3
john 5 4
jane 4
In this example john has clicked on item 1 and refreshed item 3. The rating can only go up really, i.e. if a user has previously refreshed an item I write a 4 and update only to a 5 if the item is clicked later.
Option B) each time the user does one of the above actions, I'll increment a scalar value for the item, which means it can grow unbounded.
item 1 | item 2 | item 3
john 55 1 30
jane 41 9
Maybe this is a problem, since now the numbers are harder to translate into a rating scale from 1 to 10
Option C) I count every interaction separately
item 1 click | item 1 refresh | item 1 read
john 3 1
jane 1 1
Here the problem is that "reading about" an item is probably only done once.
Independent of whatever option I choose, my idea is to first find similar users using something like cosine similarity or pearson correlation. Then pick the top 10 to 30 users from that list and compile a toplist of their favorite items. From that list, I will then recommend items that the current user has had little interaction with in the past.
Is this something that could work? I am worried that finding similar users will eliminate the chance of finding interesting (new) items for the current user.