1

I am trying to build a recommender system using Spark's MLlib library. (using Scala) In order to be able to use the ALS train method , I need to build a rating matrix using the Rating() method (which is a part the package org.apache.spark.mllib.recommendation.Rating). The method requires an int be passed as the user id . However the dataset i am working with has 11 digit id's and hence throws an error when I try to pass it.

Does anyone know if there is some way around this where I can pass a long value into the Rating method ? Or someway to override this method ? Or someway to uniquely convert the 11 digit number to 10 or 9 digits while keeping it an int?

Any help will be greatly appreciated. Thanks

zero323
  • 322,348
  • 103
  • 959
  • 935
shahharsh2603
  • 73
  • 2
  • 9

1 Answers1

2

This will depend, I think, on the range of your ids. Can you simply take the Id modulo Int.MaxValue? That is:

(id % Int.MaxValue).toInt

or can you just hash it to an Int?

id.hashCode
Will Fitzgerald
  • 1,372
  • 10
  • 14
  • The former seems like a fairly reasonable thing to do. However I wasn't sure about the range of my values. I was sure though that only a few million needed to be considered at one time. So i just created my own sort of lookup table for them. – shahharsh2603 Jun 20 '14 at 13:15