I need to create a state space for my RL problem which has about 10 state variables each which contains about 2 or 3 values for the variables. That would make the state space about 600,000 states. How do I implement this in python?
Asked
Active
Viewed 804 times
1 Answers
1
Given the number of states in your problem, maybe you should consider in using some kind of function approximation instead of using a tabular representation.
If finally you decide to use a table with 600k rows and as much columns as actions, maybe a pandas DataFrame could work.

Pablo EM
- 6,190
- 3
- 29
- 37
-
Yes this is exactly what I was looking for, I am thinking about using a Neural network for the function approximation but if that's the case do i need to one hot encode the 600k states and input that into the network? – Yomal Jul 16 '18 at 15:45
-
Yes, I guess that if your variables are categorical and you want to use a neural network, probably you should use hot encoding. If they are numerical, maybe you can treat them as continuous values, even they are discrete. Another option could be use a different kind function approximator, which is able to digest caterogical data, such as decision trees. – Pablo EM Jul 18 '18 at 03:59
-
Thanks this helped alot. When implementing this in python can you suggest how I could implement the transitions from a state to another given an action. Any reference would be greatly appreciated – Yomal Jul 18 '18 at 14:47
-
Thanks! This is totally another question, the state transition function it is not related with the RL algorithm. Usually, your environment is a function, `env(s,a)`, that takes current state `s` and a given action `a`, and returns the next state `s'`, a flag signaling if the episode has finished, and sometimes a reward (the reward could be seen as an independent function). – Pablo EM Jul 19 '18 at 02:59
-
Here [https://gym.openai.com/envs/#classic_control] you can see some typical toy environments, they are useful to understand the environment concept (probably you could find simpler Python implemantations of these environments if you google it). – Pablo EM Jul 19 '18 at 02:59