I'm building a multifactorial sales forecasting model in R studio using xgBoost regression. I have built up lags with this function
Create_lags <- function(MyData, start_index_lag, num_lags) {
lags =seq(from =start_index_lag, to=start_index_lag+num_lags)
lag_names <- paste("lag", formatC(lags,width = nchar(max(lags)), flag="0"),
sep="_")
lag_functions <- setNames(paste("dplyr::lag(.,",lags,")"), lag_names)
print(lag_functions)
MyData= MyData %>%
arrange(Channel, Product)%>%
group_by(Channel, Product)%>%
mutate_at(vars(Sales), funs_(lag_functions))
print(colnames(MyData))
return(MyData)
}
and this works fine but then I have also built up rolling means and standard deviation with the below:
Create_rolling_window_means <- function(MyData,start_index_rollfeat, num_rollfeat){
rollmean_1 = seq(from =start_index_rollfeat, to= start_index_rollfeat+num_rollfeat)
rollmean_names <- paste("rollmean", formatC(rollmean_1,
width=nchar(max(rollmean_1)),flag="0"),
sep="")
rollmean_functions <- setNames(paste("lag(roll_meanr(.,",rollmean_1,")",",1)"), rollmean_names)
print(rollmean_functions)
MyData= MyData %>%
arrange(Channel, Product)%>%
group_by(Channel, Product)%>%
mutate_at(vars(Sales), funs_(rollmean_functions))
print(colnames(MyData))
return(MyData)
}
Create_rolling_window_sd <- function(MyData, start_index_rollfeat, num_rollfeat){
rollsd_1 = seq(from =start_index_rollfeat, to= start_index_rollfeat+num_rollfeat)
rollsd_names <- paste("rollsd", formatC(rollsd_1,
width=nchar(max(rollsd_1)),flag="0"),
sep="")
rollsd_functions <- setNames(paste("lag(roll_sdr(.,",rollsd_1,")",",1)"), rollsd_names)
print(rollsd_functions)
MyData= MyData %>%
arrange(Channel, Product)%>%
group_by(Channel, Product)%>%
mutate_at(vars(Sales), funs_(rollsd_functions))
print(colnames(MyData))
return(MyData)
}
this is working fine just for one future data point but I'm in the below situation, excel example rolling mean in 3 periods
so I can predict just one future data point, so what I think I need is to fix the function in order to use the predicted rolling mean as historic data, when I don't have the actual historic data point, in a loop, in order to fill up 45 future data points (45 days), something like the example below
My final result should be a unique column filled up with the values coming from the last column (exactly the same would be for standard deviation), which then I can use as a variable in my model. Just for additional context I'm using those values:
start_index_lag=4
num_lags=60
start_index_rollfeat=4
num_rollfeat=60
forecast_horizon = 45 #45 days