I have the following data table that I want to use to predict DE prices based on the other variables in the data table with the GLM (= Generalized Linear Model).
set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1),
'solarDE' = rnorm(731, 1, 1), check.names = FALSE)
dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)
fromTestDate <- "2019-12-31"
fromDateTest <- base::toString(fromTestDate)
## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)
## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date
## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]
## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train,
family = quasi(link = "identity", variance = "constant"))
## Train and Test Data PREDICTION with xgbModel: ##
dt.train$prediction <- stats::predict.glm(xgbModel, dt.train)
dt.test$prediction <- stats::predict.glm(xgbModel, dt.test)
## Add date columns to dt.train and dt.test: ##
dt.train <- data.table(date = v.trainDate, dt.train)
dt.test <- data.table(date = v.testDate, dt.test)
Here in this code I train the model with the data from 2019-01-01
to 2019-12-31
and test it with the day-ahead forecast from 2020-01-01
.
Now I want to create a for
-loop so that I run my model 365 in total, as follows:
Run 1:
a) use 01-01-2019
to 31-12-2019
to train my model
b) predict for 01-01-2020
(test data)
c) use the actual data point for 01-01-2020
to evaluate the prediction
Run 2:
a) use 01-01-2019
to 01-01-2020
to train my model
b) predict for 02-01-2020
c) use the actual data point for 02-01-2020
to evaluate the prediction
etc.
In the end, I want to plot e.g. the cumulate sum of the individual prediction performances Or the histogram of the individual prediction performances and some summary statistics (mean, median, sd, etc.)
Unfortunately, I don't know how to start with the loop and where I can save my predictions of each run? I hope someone can help me with this!