I am building a multivariate model for direct time series forecasting, where the goal is to make 4 and 8-step-ahead forecasts using random forest and SVR. The results look very similar to my 1 step-ahead forecast and I am wondering whether my code is sensible or not.
Here is an example for some 4-step-ahead forecasts using random forest in conjunction with the predict function.
As far as I understand the difference between the 1-step-ahead and the 4-step-ahead direct forecast is that instead of the first we feed the fourth row of the test set to the predict function. Meaning in the following example:
test <- mydata_2diff[(i+4), ]
instead of
test <- mydata_2diff[(i+1), ]
My code looks as follows:
train_end <- 112 # End of the training set
j <- 1 # Loop counter
k_max <- 10 # Number of RF estimations
pred_rf_4Q_dir <- matrix(0,(nrow(mydata_2diff)-train_end-3), k_max) # Prediction matrix
{
tic()
for (i in train_end:(nrow(mydata_2diff)-4)) {
train <- mydata_2diff[1:i, ] # Training data
test <- mydata_2diff[(i+4), ] # Test data
for (k in 1:k_max){
rf_RPI <- randomForest(RPI ~ RGDP + CPI + STI + LTI + UE + SER + SPI + ARH,
data = train, ntree = 500, importance = T)
pred_rf = predict(rf_RPI, newdata=test, predict.all = T)
pred_rf_4Q_dir[j,k] <- pred_rf[["aggregate"]]
}
j <- j+1
}
toc()
}
Is this approach correct or not?
I am grateful for any feedback.