Q-learning with clustered time series

Question

I am new to Q-learning, and I recently tried to apply this algorithm to a problem with 9 states and 2 possible actions. I am considering a big number of time series, each of which has only 10 data points, and want to choose between two actions at time t=10. The problem is that matrix Q has not been updated yet for most states, leading to a random decision.

I was considering clustering the time series and getting an averaged Q for each cluster, from which I would choose an action based on the state of each particular series.

The question is whether taking the mean of multiple Q matrices could make sense or if there is any other approach that could be more suitable in this case.

Thanks for your help!

Could you provide a little more information? What is the reward structure? In addition, it is not clear to me the relation between the time series and the states. Could you clarify that too? — Juan Leni, Aug 03 '17 at 12:13
For each product, I have 4 time series A, B, C and D. Their (simplified) meaning is the following: A: simple forecast of time series C B: alternative forecast of time series C C: actual value of the series D: more expensive but (theoretically) more accurate forecast of time series C — som, Aug 07 '17 at 18:18
The state is defined comparing A and B with C (for instance, if A is more than 10% over the value of C and B is less than 10% below the value of C, the state would be "+-"). If A is close to C and B is more than 10% over the value of C, the state would be "0+". So there is a total of 9 states corresponding to all possible combinations. — som, Aug 07 '17 at 18:18
There are two possible actions: choosing series A or choosing series D in order to forecast C. If A is chosen, the reward is -abs((A-C)/C) and if D is chosen, the reward is -abs((D-C)/C). — som, Aug 07 '17 at 18:19

Q-learning with clustered time series

0 Answers0