4

I’m trying to create a sparse data matrix from a data frame without having to build a dense matrix which causes serious memory issues .

I found a SO the following post where a solution seems to be found: Create Sparse Matrix from a data frame

I've tried this solution, but, it doesn't work for me, perhaps because my UserID and MovieID doesn't t start in 1.

Here is my sample code:

library(Matrix)

UserID<-c(10090,10090,10090,10316,10316)
MovieID <-c(63155,63530,63544,63155,63545)
Rating <-c(2,2,1,2,1)
trainingData<-data.frame(UserID,MovieID,Rating)
trainingData

UIMatrix <- sparseMatrix(i = trainingData$UserID,
                         j = trainingData$MovieID,
                         x = trainingData$Rating)

dim(UIMatrix)

I expected to get a 2 x 3 matrix but the dims corresponds to the maximum user and movie id.

I've tryed the second solutions suggested in the post but it doesn't with may data work as well.

Can anyone give some advise?

Community
  • 1
  • 1
Nelson
  • 301
  • 3
  • 15
  • 1
    If I understand the `i, j`, these are row/column indices. So, the trainingData$UserID` have row indices `10090,..` and column indices are also big. Therefore the matrix size would be big enough to have those row/column index – akrun Feb 10 '15 at 11:51

1 Answers1

1

You can convert your indices to indices starting at one with as.integer(as.factor(.)).

UIMatrix <- sparseMatrix(i = as.integer(as.factor(trainingData$UserID)),
                         j = as.integer(as.factor(trainingData$MovieID)),
                         x = trainingData$Rating)

dim(UIMatrix)
# [1] 2 4

dimnames(UIMatrix) <- list(sort(unique(trainingData$UserID)),
                           sort(unique(trainingData$MovieID)))

UIMatrix
# 2 x 4 sparse Matrix of class "dgCMatrix"
#       63155 63530 63544 63545
# 10090     2     2     1     .
# 10316     2     .     .     1
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168