I'm trying to make gbm models in a loop in R with different learning rates. I want to calculate a few statistics for each model and combine them with the original data set.
But I' having an error due to the fact that each time a statistic is calculated, it is saved with the same name as the previous one and thus there is an error.
I get the following error at the end of the loop:
Error in `[<-.data.frame`(`*tmp*`, nl, value = list(dates = c(14824, 14825, :
duplicate subscripts for columns
train data is basically stock prices data with dates, open high close etc.
Following is the code:
learningRateList <- as.numeric(7:9)*0.01
for (i in learningRateList){
modelNames <- paste("gbmModel", i, sep = "")
gbmModels <-gbm.step(data=train, gbm.x = reqCol, gbm.y = CloseCol,tree.complexity =9,learning.rate = i,bag.fraction = 0.75,family ="laplace",step.size=100 )
assign(modelNames, gbmModels)
#training data
#predict values for the training data set
predTrainGbm<-paste("gbmTrainPrediction", i, sep = "")
gbmTrainPrediction <- predict.gbm(gbmModels,train,n.trees=gbmModels$gbm.call$best.trees)
assign(predTrainGbm,gbmTrainPrediction)
#calculate mape for the predictions
mapeTrain<-paste("mapeGbmTrain", i, sep = "")
mapeTrainGbm<-regr.eval(train$Close,gbmTrainPrediction,stats = "mape")
assign(mapeTrain,mapeTrainGbm)
train<-cbind(train,predTrainGbm,mapeTrain)
#creating plots of actual vs predicted values
imageGbmName<-paste(fileCalculated,"Gbm Prediction",i,".png")
png(imageGbmName)
par(mfrow=c(2,1))
plot(train$Close,type="l",col="red",main = "Training set")
lines(gbmTrainPrediction,col="green")
plot(test$Close,type="l",col="red",main = "Test Set")
lines(gbmTestPrediction,col="green")
dev.off()
}