Hy, in the last days I had a small/big problem.
I have a transaction dataset, with 1 million rows and two columns (Client Id and product id) and I want transform this in a binary matrix. I used reshape and spread function, but in both cases I used 64mb ram and Rstudio/R goes down. Because I only use 1 CPU, the process takes a lot of time My question is, what is it the new steep forward in this transition between small and big data? Who can I use more cpu?
I search and I found a couple of solution but I need a expertise opinion
1 - Using Spark R?
2 - H20.ai solution? http://h2o.ai/product/enterprise-support/
3 - Revolution analytics? http://www.revolutionanalytics.com/big-data
4 - go to the cloud? like microsoft azure?
If I needed I can use a virtual machine with a lot of cores.. but I need to know what is the smooth way to make this transaction
My specific problem
I have this data.frame (but with 1 million rows)
Sell<-data.frame(UserId = c(1,1,1,2,2,3,4), Code = c(111,12,333,12,111,2,3))
and I did:
Sell[,3] <-1
test<-spread(Sell, Code, V3)
this works with a little data set.. but with 1 million rows this takes a long time (12 hours) and goes down because my maximum ram is 64MB. Any suggestions?