0

I have the following data table that I want to use to predict DE prices based on the other variables in the data table with the GLM (= Generalized Linear Model).

set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
                      'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
                      'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1), 
                      'solarDE' = rnorm(731, 1, 1), check.names = FALSE)


dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)


fromTestDate <- "2019-12-31"
fromDateTest <- base::toString(fromTestDate)      


## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)

## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date

## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]


## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train, 
                       family = quasi(link = "identity", variance = "constant"))


## Train and Test Data PREDICTION with xgbModel: ##
dt.train$prediction <- stats::predict.glm(xgbModel, dt.train)
dt.test$prediction <- stats::predict.glm(xgbModel, dt.test)


## Add date columns to dt.train and dt.test: ##
dt.train <- data.table(date = v.trainDate, dt.train)
dt.test <- data.table(date = v.testDate, dt.test)

Here in this code I train the model with the data from 2019-01-01 to 2019-12-31 and test it with the day-ahead forecast from 2020-01-01. Now I want to create a for-loop so that I run my model 365 in total, as follows:

Run 1:

a) use 01-01-2019 to 31-12-2019 to train my model

b) predict for 01-01-2020 (test data)

c) use the actual data point for 01-01-2020 to evaluate the prediction

Run 2:

a) use 01-01-2019 to 01-01-2020 to train my model

b) predict for 02-01-2020

c) use the actual data point for 02-01-2020 to evaluate the prediction

etc.

In the end, I want to plot e.g. the cumulate sum of the individual prediction performances Or the histogram of the individual prediction performances and some summary statistics (mean, median, sd, etc.)

Unfortunately, I don't know how to start with the loop and where I can save my predictions of each run? I hope someone can help me with this!

MikiK
  • 398
  • 6
  • 19

1 Answers1

1

Basically, you have to construct a vector that contains the end dates for each run. Then, you can pick one of the end dates in each iteration of the loop, run the model and predict one day ahead. Using your code, this may look something like this:

set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
                      'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
                      'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1), 
                      'solarDE' = rnorm(731, 1, 1), check.names = FALSE)


dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)

Here, I construct a vector holding all days between Dec 31 2019 and Jan 15 2020, adapt as needed:

# vector of all end dates
eval.dates <- seq.Date(from = as.Date("2019-12-31"), 
                       to   = as.Date("2020-01-15"),
                       by   = 1)

Here, I create a storage file for the one-day ahead predictions

# storage file for all predictions
test.predictions  <- numeric(length = length(eval.dates))

Now, run the loop using your code and pick one of the end dates in each iteration:

for(ii in 1:length(eval.dates)){ # loop start

fromTestDate <- eval.dates[ii] # get end date for iteration
fromDateTest <- base::toString(fromTestDate)      


## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)

## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date

## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]


## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train, 
                       family = quasi(link = "identity", variance = "constant"))






## Train and Test Data PREDICTION with xgbModel: ##
test.predictions[ii] <- stats::predict.glm(xgbModel, dt.test)


# verbose
print(ii)

} # loop end

As you can see, this is a bit of a shortened version of your code and I omitted the predictions for the training set for brevity. They can easily be added along the lines of the code you have above.

You did not specify which measures you want to use to evaluate your out-of-sample predictions. The object test.predictions holds all your one-step-ahead predictions and you can use this to compute RMSEs, LPS or whatever quantification of predictive power that you'd like to use.

yrx1702
  • 1,619
  • 15
  • 27
  • For the evaluation of my day-ahead forecasts I would like to compare these forecasts with the actual DE prices, these are in the data table ```dt.data``` in the second column. – MikiK Jan 02 '21 at 09:40
  • How would this work for the train data set? I have already tried it, but I get always the same values for each date of the train data set.. – MikiK Jan 02 '21 at 10:04
  • For the train data set, you'd need to construct a matrix or an array, because you do more than one prediction per iteration of the loop, hence a vector is not suitable anymore. – yrx1702 Jan 02 '21 at 13:10
  • Unfortunately, I don't get it for the train data set.... – MikiK Jan 02 '21 at 19:29
  • How do I get the overall RSME of the iterations? – MikiK Jan 04 '21 at 06:23
  • See "Formula" in https://en.m.wikipedia.org/wiki/Root-mean-square_deviation – yrx1702 Jan 04 '21 at 07:37