9

I want to write a function that gets a time series and a standard deviation as parameters and returns an adjusted time series which looks like a forecast.

With this function I want to test a system for stability, which gets a forecasted time series list for weather as input parameter.

My approach for such a function, which is described below:

vector<tuple<datetime, double>> get_adjusted_timeseries(vector<tuple<datetime, double>>& timeseries_original, const double stddev, const double dist_mid)
{

    auto timeseries_copy(timeseries_original);

    int sign = randInRange(0, 1) == 0 ? 1 : -1;


    auto left_limit = normal_cdf_inverse(0.5 - dist_mid, 0, stddev);
    auto right_limit = normal_cdf_inverse(0.5 + dist_mid, 0, stddev);

    for (auto& pair : timeseries_copy)
    {
        double number;
        do
        {
            nd_value = normal_distribution_r(0, stddev);
        }
        while (sign == -1 && nd_value > 0.0 || sign == 1 && nd_value < 0.0);


        pair = make_tuple(get<0>(pair), get<1>(pair) + (nd_value / 100) * get<1>(pair));


        if (nd_value > 0.0 && nd_value < right_limit || nd_value < 0.0 && nd_value > left_limit)
        {
            sign = sign == -1 ? 1 : -1;
        }
    }

    return timeseries_copy;
}
  • Make a copy from the original time series, which is also from type vector<tuple<datetime, double>>
  • Get a random number that is either 0 or 1 and use the number to set the sign.
  • Use the Inverse Cumulative distribution function to get the limits, which indicate when the sign is changed. The sign is changed when the value of the copied time series is close to the original value. The implementation of the inverse CDF is shown here enter image description here
  • For-loop for each item in the time series:
    • get a normal distributed value, which should be lower zero when sign == -1 and greater zero when sign == 1
    • adjust old value of time series according to the normal distributed value
    • change sign if the normal distributed value is close to the original value.

The result for a low standard deviation, for example, can be seen here in yellow: enter image description here If the mean absolute percentage error (MAPE) of the two time series is calculated, the following relationship results:

  • stddev: 5 -> MAPE: ~0.04
  • stddev: 10 -> MAPE: ~0.08
  • stddev: 15 -> MAPE: ~0.12
  • stddev: 20 -> MAPE: ~0.16

What do you think of this approach?

Can this function be used to test a system that has to deal with predicted time series?

Bowers
  • 836
  • 8
  • 20
  • Maybe the site SE/Signal Processing is more adapted for this question – Damien May 21 '19 at 14:08
  • @Damien thanks for the hint, I checked Data Science, Code Review and Signal Processing, but the tags fit here the best. – Bowers May 22 '19 at 08:03
  • 1
    I have no idea about the approach but it seems that you'd want `randInRange` to be _good_ for this to work. Is it? `double number;` isn't used? `sign = sign == -1 ? 1 : -1;` can be written as `sign = -sign;` – Ted Lyngmo May 23 '19 at 22:11
  • 1
    What do you mean by or how do you define "look like a forecast"? How are you going to use the generated random time series to test your system? You mentioned that your "system deals with predicted time series", what does your system do exactly? – RobertBaron May 24 '19 at 18:06
  • Maybe try [codereview.se]? – L. F. May 26 '19 at 12:38
  • I cannot understand the main point of the Q. Do you want to forecast a time series? Or you want to evaluate a forecast function? Can u make it clear? I cannot understand why you first produce some fake data and then evaluate your function using them? This evaluation methos is useless because you are not assesding ur function based on the real data. – TonySalimi May 27 '19 at 19:47
  • @RobertBaron The system runs on the basis of an operating plan, which is made for the coming day on the previous day. Therefore, it will be tested to what extent the behavior of the system deviates if the predicted inputs do not correspond exactly to reality. – Bowers May 30 '19 at 19:34
  • 1
    I see what you want to do. What you suggest is good. It is effectively adding "white noise" (i.e. normally distributed noise) to, I assume, past real data. If real data are or can be affected by such noise, then you will be able to evaluate what level of noise the system can tolerate. There might be other forms of noise that you may want to test your system with. Do you know these other types of noise that can affect your system? – RobertBaron May 30 '19 at 19:52
  • @RobertBaron Thank you very much for the answer. I was hoping for responses like this. The data that will be predicted for the coming day and that will influence the operational plan are weather and stock exchange data. I think your suggestion to test other noise types is very good. Do you have any idea which noise types would be interesting for these types of data? – Bowers May 31 '19 at 19:58

1 Answers1

1

You want to generate time series data that behave like some existing time series data that you have from real phenomena (weather and stock exchange). That generated time series data will be fed into some system to test its stability.

What you could do is: fit some model to your exiting data, and then use that model to generate data that follow the model, and hence your existing data. Fitting data to a model yields a set of model parameters and a set of deviations (differences not explained by the model). The deviations may follow some known density function but not necessarily. Given the model parameters and deviations, you can generate data that look like the original data. Note that if the model does not explain the data well, deviations will be large, and the data generated with the model will not look like the original data.

For example, if you know your data is linear, you fit a line through them, and your model would be:

y = M x + B + E

where E is a random variable that follows the distribution of the error around the line that fits your data, and where M and B are the model parameters. You can now use that model to generate (x, y) coordinates that are rougly linear. When sampling the random variable E, you can assume that it follows some known distribution like a normal distribution, or use an histogram, to generate deviations that follow arbitrary density functions.

There are several time series models that you could use to fit your weather and stock exchange data. You could look at exponential smoothing. It has several different models. I am sure you can find many other models on Wikipedia.

If a model does not fit well your data, you can also see its parameters as random variables. In our example above, suppose that we have observed data where it seems that the slope is changing. We would fit several lines and obtain a distribution for M. We would then sample that variable along with E when generating data.

RobertBaron
  • 2,817
  • 1
  • 12
  • 19
  • 1
    @Anne Bierhoff While answering another question, I gave an example that shows how to calculate a random time series with the exponential smoothing models. See https://stackoverflow.com/questions/56466979/modify-code-to-get-synthetic-data-that-trends-smoothly/56467512#56467512 – RobertBaron Jun 07 '19 at 21:13
  • Thank you so much for your trouble. How would you proceed to see at what prediction accuracy (in percent) the system breaks and no longer produces meaningful results? – Bowers Jun 11 '19 at 19:28
  • That is really dépendent on your system, on what it is expected to produce as meaningful results. – RobertBaron Jun 19 '19 at 18:42
  • Assuming I have a prediction that matches the actual data with 90% accuracy. If I now want to test how the system behaves with forecasts of 95%, could I simply shift all the individual forecast values by a small percentage in the direction of actual data? This would only reduce or increase the types of noise that occur in the forecast, wouldn't it? – Bowers Oct 01 '19 at 16:45
  • 1
    I think that scaling by a percentage will change the probability distribution of the noise. For example, if you assume that the noise is Normal, to get smaller errors, you want to recalculate the noise values with a smaller standard deviation. Scaling by a percentage will not yield the same result. – RobertBaron Oct 02 '19 at 13:28
  • Thank you for your advice. Perhaps it would be better to do the following: calculate the deviation between prediction and actual data for each value and normalize it so that the largest deviation is 0 and the smallest is 1. Then: `adjusted_i = forecast_i + (actual_i - forecast_i) * (normalized_i * p)`, where `p` defines the strength of the correction. In this way, large deviations would be corrected only gradually with an increase of "p". Would this also change the probability distribution of the noise? – Bowers Oct 02 '19 at 16:44