1

Say I have the following dataset:

 time_m = {A:1, B:2, C:3, D:10}; 
 time_n = {A:6, B:2, C:12, D:18}; 
 time_p = {A:1, B:2, C:9, D:17}; 
 time_q = {A:1, B:2, C:9, D:2}. 

As you can see, I have 4 variables A, B, C, D whose values are measured at time points m, n, p, q.

I want to find time points in the data when the variables had the same values again. For example, if I want to maximise the number of variables, the answer is:

{A, B, C} at {time_p, time_q}

Or, if I want to maximise the number of time points, the answer becomes:

{A, B} at {time_m, time_p, time_q}

For a little bit more context, say A, B, C, D are stock prices and I am interested to analyse historical data to find when a subset of stocks reached the same values again.

How can I do this? Is the algorithm for this, or a similar one, implemented anywhere?

WindChimes
  • 2,955
  • 4
  • 25
  • 26
  • how big is your observation data `{ A, B, C, D, ... }` dimensionality? Scaling of this problem is **`[m,n]`** and can be a performance issue for a universally working solution over a long-term time-series quantitative modelling across a wider/multi-assets' landscapes to cover [stocks,FX-spots,options,futures,...]. – user3666197 Mar 24 '16 at 18:43
  • Definitely use `Random Forest` probably the easiest and most efficient to implement. `sklearn` also has a nice implementation if you are using Python http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html – Philipp Braun Apr 23 '16 at 10:32

0 Answers0