-1

I received a feedback from my paper about stock market forecasting with Machine Learning, and the reviewer asked the following:

I would like you to statistically test the out-of-sample performance of your methods. Hence 'differ significantly' in the original wording. I agree that some of the figures look awesome visually, but visually, random noise seems to contain patterns. I believe Sortino Ratio is the appropriate statistic to test, and it can be tested by using bootstrap. I.e., a distribution is obtained for both BH and your strategy, and the overlap of these distributions is calculated.

My problem is that I never did that for time series data. My validation procedure is using a strategy called walk forward, where I shift data in time 11 times, generating 11 different combinations of training and test with no overlap. So, here are my questions:

1- what would be the best (or more appropriate) statistical test to use given what the reviewer is asking?

2- If I remember well, statistical tests require vectors as input, is that correct? can I generate a vector containing 11 values of sortino ratios (1 for each walk) and then compare them with baselines? or should I run my code more than once? I am afraid the last choice would be unfeasible given the sort time to review.

So, what would be the correct actions to compare machine learning approaches statistically in this time series scenario?

mad
  • 2,677
  • 8
  • 35
  • 78

1 Answers1

1

Pointing out random noise seems to contain patterns, It's mean your plots have nice patterns, but it's might be random noise following [x] distribution (i.e. random uniform noise), which make things less accurate. It might be a good idea to split data into a k groups randomly, then apply Z-Test or T-test, pairwise compare the k-groups.

The reviewer point out the Sortino ratio which seems to be ambiguous as you are targeting to have a machine learning model, for a forecasting task, it's meant that, what you actually care about is the forecasting accuracy and reliability which could be granted if you are using Cross-Vaildation, in convex optimization it's equivalent to use the sensitivity analysis.


Update

The problem of serial dependency for time series data, raised in case of we have non-stationary time series data (low patterns), which seems to be not the problem of your data, even if it's the case, it's could be solved by removing the trends, i.e. convert non-stationery time series into stationery, using ADF Test for example, and might also consider using ARIMA models.

Time shifting, sometimes could be useful, but it's not considered to be a good measurement of noises, but it's might help to improve model accuracy by shifting data and extracting some features (ex. mean, variance over window size, etc.).

There's nothing preventing you to try time shifting approach, but you can't rely on it as an accurate measurement and you still need to prove your statistical analysis, using more robust techniques.

4.Pi.n
  • 1,151
  • 6
  • 15
  • thanks for your answer! With time series data, I use the past to predict the future, if I split the data randomly, I will have past and future mixed up in the testing data, and might have past and future mixed up in the training data, which would make no sense in time series. What I’m doing is having different training and testing data, but with subtle shifts in time with no overlap, so each of these shifts can be considered a different run of the algorithm. Do you believe I could use each of these shifts to extract the metrics and then compare them using the tests you suggested? thanks again. – mad Jul 06 '20 at 19:50
  • Thanks again, what would be a more reliable technique you recommend for time shifting approach if I choose to use it? – mad Jul 06 '20 at 20:30
  • 1
    Using a statistical test, **ADF Test**, **ARIMA** models, extracting residuals, and for the machine learning model you really need to use **Cross-Validation** as it's a robust measurement of solution stability – 4.Pi.n Jul 06 '20 at 20:35