0

I have to recommend videos to users. I have csv file containing userId, videoId, productId. Under a product id there are many similar videos present. Like:

userId videoId productId

1 2 1

1 3 1

1 5 2

2 7 2

2 8 1

2 2 1

for more clarity again I am factorizing it :

user and video relationship:

userId videoId

1 2

1 3

1 5

2 7

2 8

2 2

consider user and video: As we see user 1 is similar to user 2 on the basis of videoid 2 so, i will recommend user 1 to watch 7 and 8 video. simple :)

But the twist is actual product and video data like this:

videoId productId

2 1

3 1

5 2

7 2

8 1

2 1

4 1

6 1

video 4 and 6 also coming under productid 1. Think if user 1 come and see videoid 2 i will have to recommend 7,8(on the basis of similar user) and 4,6(on the basis of similar video under same product but not present in actual csv).

My question is:

  1. do I need to factorize the csv.

  2. what is the best algo to do it.

3.after getting result video , how to rank them

ricky
  • 41
  • 3

1 Answers1

0

What do you want to recommend, product or video? Choose one and throw the other away, I don't see what use it is. The recommendations will come back ordered and with estimated preference weights.

Which version of the Mahout recommenders to use depends on how much data you have, how many users and items. Also how often you get new preference data. All of the Mahout 0.9 recommenders can only recommend to users that have expressed preferences and only use preferences used to calculate the model.

Mahout 1.0 has a completely different mechanism that can recommend to anonymous or new users as long as you have some preference data for them. This data need not be in the model built by Mahout. This method requires the use of a search engine like Solr or Elasticsearch.

Mahout docs: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

A preso I put together: http://www.slideshare.net/pferrel/unified-recommender-39986309

pferrel
  • 5,673
  • 5
  • 30
  • 41