I want to use DQN on recommendation system for retail industry
but the problem is, the state space of this question are time-inhomogeneous & not deterministic
(compare to Atari games)
I figure out two method for this problem
- make state-transition become deterministic
- use historical data to calculate transition probabilities, use probabilities to transit state
but...both of them seems not make sense
somebody point out this kind issues
if I want to build a recommendation system based on Reinforcement Learning
where should I start?