Do you have any readings recommendation on correcting forecast bias? For example, I use an ARIMA model to predict a time series. Is there a way based on the backtesting results to correct the bias of the forecast?
Asked
Active
Viewed 430 times
1

user3666197
- 1
- 6
- 50
- 92

donpresente
- 1,230
- 2
- 13
- 24
-
welcome to the worlds of StackOverflow. You might already have seen that some moderators are "keen" on penalising posts, that do not meet a StackOverflow standard of a Minimum Complete Verifiable Example of code ( a.k.a. MCVE related Question ). You might opt to update / edit your question so as to meet such pratice ( ideally before any such adverse effect takes place ). The best would be to read StackOverflow do-s & don't-s so as to learn what the community rules have been set and to find your own way, how to live within these. **Anyway enjoy being new contributing member of StackOverflow** – user3666197 Oct 31 '15 at 19:02
1 Answers
3
How to handle an all present Bias
/ Overfit
struggle?
Using a tactical methodology:
one principal approach to this is to systematically tune a Predictor ( be it ARIMA
or some other ) via a two step approach.
You have to split available DataSET into two parts, so as to emulate a near "Future", and "hide" the -- say about 20-30% of the observations -- second part of the DataSET from a process of [1] Training and find it's use in a step [2] called CrossValidation of predictions.
This methodology allows one to search both the StateSPACE of a Predictor engine's configurations and data-related bias/overfit. Some use only the former part of the minimiser search ( lowest error / highest utility function ), some only the latter ( alike Leo Breiman's RandomForest
modification of ensemble based method ) and some use both.
- Train a pre-configured Predictor on
aTrainingSubPartOfAvailableDataSET
- Once such a configuration of a Predictor got trained, cross-validate this configuration's ability to predict against
aCrossValidationSubPartOfAvailableDataSET
not seen in the process of training (Step 1.) to observe theBias
/Overfit
artefacts and proceed towards the lowest Cross-Validation error / best generalisation area of plausible configuration settings.

user3666197
- 1
- 6
- 50
- 92
-
Thanks!. I am doing the cross-validation with a backtesting exercise (like a leave one out exercise). And then run a simulation (like a test data set). So I divided the data in 3 parts. But i am wondering. If just using error measures is the correct approach. I should be able to forecast some forecast error. I am saying this rulling out confident intervals. – donpresente Oct 31 '15 at 20:09
-
@donpresente **Oh yes, this is definitely possible**. Once your methodology keeps fair the process of **separation** between **`aTrainingSubPartOfAvailableDataSET`** for initial training and a part emulating an Out-of-sample examples for validation purpose for getting the best learner ( generalisation-capable Predictor ), one might employ **Hoeffding's Inequality** which is exactly limiting the future predictions' errors of such trained Predictor. – user3666197 Nov 23 '15 at 08:05
-
Is that bound tight? Doesnt make assumptions on the errors to be gausian? – donpresente Nov 23 '15 at 18:48
-
Hoeffding bound formulates upper bounds for a probability that an out-of-sample-example prediction will result in an error greater than a certain "tolerable" treshold. No assumption about a distribution thereof, but a certainty about progressively decreasing such probability is the "weapon" for tightening – user3666197 Nov 23 '15 at 21:29
-
-
-
Btw, could you, @donpresente, kindly explain, if your ARIMA implementation is an OSS-published or a home-brew one? – user3666197 Nov 25 '15 at 13:06
-
Sure. Using Arima() o auto.arima() from Forecast package. The correction that you make is "plus or minus something (a quantity)" or is more elaborated? Thanks for your help – donpresente Nov 26 '15 at 09:09
-
I knew about R package for arima, interested in python or C port thereof. – user3666197 Nov 26 '15 at 12:04
-
**Hoeffding bound** expresses an upper bound for a probability Pr( | E_in - E_oos | > EPSILON ), that a prediction error E_oos, for an out-of-sample example, could fall farther than a tolerable prediction error EPSILON from a known training error E_in elaborated during a learner training process. – user3666197 Nov 26 '15 at 12:10
-
1Have you use http://statsmodels.sourceforge.net/ for Python? I did not try it. Because i use python for text mining or classification problem only. – donpresente Nov 26 '15 at 12:52
-
Thanks for copying the bound. Yes i see your point. But you make a correction to the forecast right? given the fact that you bound the error? That was my initial question for you about forecast correction. – donpresente Nov 26 '15 at 13:01