Difference between batch q learning and growing batch q learning

Question

I am confused about the difference between batch and growing batch q learning. Also, if I only have historical data, can I implement growing batch q learning?

Thank you!

score 2 · Accepted Answer · answered Oct 26 '15 at 17:03

2

In batch Q-learning you only have historical data, with no possibility to adquire new data following a given policy. On the contrary, in growing batch Q-learning, the algoritm is almost equal, with the difference that in some iterations you use intermediate policies to acquire more data, thus growing the batch of data with new data (which incorporate exploration).

So, if you only have historical data, it is not possible to grow the batch with new data. I.e, in your case is not possible to implemente growing batch Q-learning.

You can read a detailed explanation in chapter 2 of the book: Wiering, Marco, y Martijn van Otterlo, eds. Reinforcement Learning: State-of-the-Art. 2012.ª ed. Springer, 2012. Link to the chapter

answered Oct 26 '15 at 17:03

Pablo EM

6,190
3
29
37

Thanks! Do you know how I can evaluate the performance? I guess the only way is to take it online and interact with the environment.... – ChiefsCreation Oct 27 '15 at 16:35
Yes, it is the only way I know. If you have few states/actions and a **lot** of data, you can try an approach similar to this paper: http://arxiv.org/abs/1003.5956 The idea is to take from the complete data set only the state/actions pais that match the policy that you have learned. But as I say, this is only feasible if you have a lot of data and few state/action pairs. – Pablo EM Oct 27 '15 at 17:07
Thanks! I am afraid I dont have much data. BTW, is it possible for me to do some policy evaluation using Monte Carlo methods like what this paper mentioned http://jmlr.csail.mit.edu/proceedings/papers/v9/fonteneau10a/fonteneau10a.pdf? But I dont think policy evaluation methods work in my case... Since I get my policy from the historical data, it doesnt make sense for me to use the same data to evaluate my policy. Right? – ChiefsCreation Nov 02 '15 at 01:15
i) Maybe the method described in the paper is useful for your case. However, the paper is quite theoretical and only provides one example on an academic problem, so it is difficult to know the expected performance on more realistic (complex) problems. ii) The important thing about your historical data is if it has been collected following a fixed policy (i.e, the data only contains information about one policy) or it has been obtained following a policy with some randomness. In other words, your data contains explorative actions? Exploration is a necessary condition to learn useful policies. – Pablo EM Nov 02 '15 at 10:02
Thanks! The historical data is the only thing I have, so I guess I have to learn a policy regardless of whether the data is random or not. So do you mean that other available policy evaluation methods (such as Monte Carlo policy evaluation) can be used in my case? I thought policy evaluation means for a policy, find its state value function. Since I learn my policy from a state value function, does it make sense for me to back calculate the state value function using the same data? – ChiefsCreation Nov 02 '15 at 13:59
Welcome! I can imagine that the data is the only thing you have, but before starting to work on the problem, you should think if you can obtain something useful from your data. You are right, policy evaluation means to find the value function of a given policy. But the value function measures the quality of the policy, i.e., it's a possible way of performance evaluation. Anyway, you probably can find more (and better) advices from a very specific commnity in the Reinforcement Learning Mailing List (https://groups.google.com/forum/#!forum/rl-list). – Pablo EM Nov 03 '15 at 09:18

Difference between batch q learning and growing batch q learning

1 Answers1