My training dataset has about 200,000 records and I have 500 features. (These are sales data from a retail org). Most of the features are 0/1 and is stored as a sparse matrix.
The goal is to predict the probability to buy for about 200 products. So, I would need to use the same 500 features to predict the probability of purchase for 200 products. Since glmnet is a natural choice for model creation, I thought about implementing glmnet in parallel for the 200 products. (Since all the 200 models are independent) But I am stuck using foreach. The code I executed was:
foreach(i = 1:ncol(target)) %dopar%
{
assign(model[i],cv.glmnet(x,target[,i],family="binomial",alpha=0,type.measure="auc",grouped=FALSE,standardize=FALSE,parallel=TRUE))
}
model is a list - having the list of 200 model names where I want to store the respective models.
The following code works. But it doesn't exploit the parallel structure and takes about a day to finish !
for(i in 1:ncol(target))
{ assign(model[i],cv.glmnet(x,target[,i],family="binomial",alpha=0,type.measure="auc",grouped=FALSE,standardize=FALSE,parallel=TRUE))
}
Can someone point to me on how to exploit the parallel structure in this case?