1

I have a user-event data with user ratings for events user has attended.I am trying to use Jama library for similarity matrix which needs two dimensional array as input (users-event matrix with event ratings)

the data i have as three columns : userID, eventID, rating

But the number of users is around two million and events count to around 1 million. So what should be the efficient method to make use of jama library for computing similarity matrix.As the numbers are huge and doesnt fit in the memory in my system. I am using JAVA for the problem.

Thanks Aman

aman
  • 1,875
  • 4
  • 18
  • 27
  • Buy a system with the 2 terabytes of memory you'd need to store that matrix...? Or find a mathematical model that requires less memory, or store most of the data on the hard drive and only load parts of the matrix at a time. – Flight Odyssey Jul 27 '14 at 14:13
  • With java you run into trouble even on 64-bit machines you have limited amount of memory. – kiltek Jul 27 '14 at 14:51
  • Would the use of mahout(on hadoop) library prove to be fruitful, as i have a single machine? – aman Jul 27 '14 at 20:56

0 Answers0