2

I am using RStudio with a 8 GB RAM machine (MacBook Pro). I also use RStudio Server in AWS with 15 GB RAM.

Both couldn't seem to finish som() with data size of 800,000+ records. 100,000 records seems to be never ending as well.

I wonder if there's a practical data size limit for R kohonen package? And subsequently, how should I go about processing SOM for such big data?

UPDATE: The RStudio Server finally finishes with error:

Error in matrix(0, nd * ncodes, nmaps) : invalid 'nrow' value (too large or NA) In addition: Warning message: In nd * ncodes : NAs produced by integer overflow

So what's the limit then?

ainunnajib
  • 41
  • 5

2 Answers2

2

I had the same problem, it turned out that I did not convert some data to a matrix.

The kohonen package does not handle data frames that well. Make sure to use:

as.matrix(data)

e.g.

som_model <- som(data = as.matrix(trainingset), grid = som_grid, rlen=1000, alpha=c(0.05,0.01), 
             keep.data = TRUE,
             n.hood="circular" )
prediction <- predict(som_model, newdata = as.matrix(testset), trainX = as.matrix(trainingset), trainY=cl )
0

Rather than using 'trainingset' as trainX, i recycle som_model's data (since keep.data = TRUE), so i don't have to keep an extra dataset for trainX when space is a constrain.

prediction <- predict(som_model, newdata = as.matrix(testset), trainX = som_model$data, trainY=cl )
Choc_waffles
  • 518
  • 1
  • 4
  • 15