Reinforcement learning DQN environment structure

Question

I am wondering how best to feed back the changes my DQN agent makes on its environment, back to itself.

I have a battery model whereby an agent can observe a time-series forecast of 17 steps, and 5 features. It then makes a decision on whether to charge or discharge.

I want to includes its current state of charge (empty, half full, full etc) in its observation space (i.e. somewhere within the (17,5) dataframes I am feeding it).

I have several options, I can either set a whole column to the state of charge value, a whole row, or I can flatten the whole dataframe and set one value to the state of charge value.

Is any of these unwise? It seem a little rudimentary to me to set a whole columns to a single value, but should it actually impact performance? I am wary of flattening the whole thing as I plan to use either conv or lstm layers (although the current model is just dense layers).

score 1 · Accepted Answer · answered Feb 03 '21 at 14:59

You would not want to add in unnecessary features which are repetitive in the state representation as it might hamper your RL agent convergence later when you would want to scale your model to larger input sizes(if that is in your plan).

Also, the decision of how much of information you would want to give in the state representation is mostly experimental. The best way to start would be to just give in a single value as the battery state. But if the model does not converge, then maybe you could try out the other options you have mentioned in your question.

Reinforcement learning DQN environment structure

1 Answers1