I am working on a forecasting model for monthly data which I intend to use in SQL server 2016 (in-database).
I created a simple TBATS model for testing:
dataset <- msts(data = dataset[,3],
start = c(as.numeric(dataset[1,1]),
as.numeric(dataset[1,2])),
seasonal.periods = c(1,12))
dataset <- tsclean(dataset,
replace.missing = TRUE,
lambda = BoxCox.lambda(dataset,
method = "loglik",
lower = -2,
upper = 1))
dataset <- tbats(dataset,
use.arma.errors = TRUE,
use.parallel = TRUE,
num.cores = NULL
)
dataset <- forecast(dataset,
level =c (80,95),
h = 24)
dataset <- as.data.frame(dataset)
Dataset was imported from .csv file I created with SQL query.
Later, I used same code in SQL server, input being the same query I used for .csv file (meaning data was exactly the same aswell)
However, when I executed the script, I noticed I got different results. All numbers look fine and make perfect sense, both SQL and standalone R give a forecast table, but all numbers between two tables differ for few % (about 3% on average).
Is there an explanation for this? It really bothers me as I need best possible results.
EDIT: This is how my data looks for easier understanding. It's basically 3 column table: year, month, value of transactions (numbers are randomised because data is classified). Alltogether I have data for 9 years.
2008 11 1093747561919.38
2008 12 816860005030.31
2009 1 341394536377.06
2009 2 669993867646.25
2009 3 717585597605.75
2009 4 627553319006.03
2009 5 984146176491.78
2009 6 605488762214.33
2009 7 355366795222.40
2009 8 549252969698.07
2009 9 598237364101.23
This is an example of results. Top two rows are from SQL server, bottom two rows are from RStudio.
t Point Lo80 Hi80
1 872379.7412 557105.271 1187654.211
2 1093817.266 778527.1078 1409107.424
1 806050.6884 517606.464 1094494.913
2 1031845.483 743387.015 1320303.95
EDIT 2: I checked each part of code carefully and I figured out that difference in results happens at TBATS model.
SQL server returns: TBATS(0.684, {0,0}, -, {<12,5>})
RStudio returns: TBATS(0.463, {0,0}, -, {<12,5>})
This explains difference in forecast values, but the question remains as these should be the same.