I am wondering how best to feed back the changes my DQN agent makes on its environment, back to itself.
I have a battery model whereby an agent can observe a time-series forecast of 17 steps, and 5 features. It then makes a decision on whether to charge or discharge.
I want to includes its current state of charge (empty, half full, full etc) in its observation space (i.e. somewhere within the (17,5) dataframes I am feeding it).
I have several options, I can either set a whole column to the state of charge value, a whole row, or I can flatten the whole dataframe and set one value to the state of charge value.
Is any of these unwise? It seem a little rudimentary to me to set a whole columns to a single value, but should it actually impact performance? I am wary of flattening the whole thing as I plan to use either conv or lstm layers (although the current model is just dense layers).