DQN is a multi-layered neural network, added target network and experience replay to Q-learning
Questions tagged [dqn]
206 questions
0
votes
1 answer
Why did I get an ModuleNotFound error after I installed keras?
I did this to study DQN. I am sure I have installed keras, because when I type into the command prompt pip install keras all I get is Requirement is Already Satisfied.
My code:
from dqn_agent import DQNAgent
from tetris import Tetris
from datetime…

NoobCoder
- 1
- 4
0
votes
1 answer
Deep Q - Learning for Cartpole with Tensorflow in Python
I know there are many similar topics discussed on StackOverflow, but I have done quite a lot research both in StackOverflow and on the Internet and I couldn't find a solution.
I am trying to implement the classic Deep Q Learning Algorithm to solve…

LazyAnalyst
- 426
- 3
- 16
0
votes
2 answers
What's the principle to design the reward function, of DQN?
I'm designing a reward function of a DQN model, the most tricky part of Deep reinforcement learning part. I referred several cases, and noticed usually the reward will set in [-1, 1]. Considering if the negative reward is triggered less times, more…

赵天阳
- 109
- 1
- 11
0
votes
1 answer
How to resolve array input shape errors for LSTM DQN?
Am building a DQN with LSTM layers.
Trying to pass 96timeperiod, 33feature arrays to the model, for training ie: shape=(96, 33)
Also trying to implement a post-padded mask (val=0.) to accomodate variable length sequences (max length=96).
model =…

MarkD
- 395
- 3
- 14
0
votes
0 answers
Why does my agent always takes a same action in RL?
0
I'm trying to reproduce the work in the paper Demand Response for Home Energy Management Using Reinforcement Learning and Artificial Neural Network. I want to optimize the power consumption for home appliances. The action space is a different…

Aya
- 1
- 2
0
votes
1 answer
How to determine whether use positive or negative reward in DQN model?
I'm new to deep reinforcement learning, DQN model. I used Open AI gym to reproduce some experiment named CartPole-v0 and MountainCar-v0 respectively.
I referred code from Github,…

赵天阳
- 109
- 1
- 11
0
votes
0 answers
tensorflow not utelizing all CPU cores
I am using reinforcement learning in combination with a neural network (DQN). I have a MacBook with a 6 core i7 and an AMD GPU. TensorFlow doesn't see the GPU so it uses the CPU automatically. When I run the script I see in activity monitor that the…

SirPVP
- 45
- 5
0
votes
0 answers
For DQN with prioritized experience replay, what is the TD error for terminal states?
While calculating TD error of the target network in Prioritized Experience Replay, we have from the paper equation 2) in Appendix B:
$$\delta_t := R_t + \gamma max_a Q(S_t, a) - Q(S_{t-1}, A_{t-1})$$
It seems unnecessary / incorrect to me that the…

Srikiran
- 309
- 1
- 3
- 9
0
votes
1 answer
How Double QN works?
What is the idea behind double QN?
The Bellman equation used to calculate the Q values to update the online network follows the equation:
value = reward + discount_factor *…

joseph
- 181
- 9
0
votes
1 answer
Deep Q Learning agent finds solution then diverges again
I am trying to train a DQN Agent to solve AI Gym's Cartpole-v0 environment. I have started with this person's implementation just to get some hands-on experience. What I noticed is that during training, after many episodes the agent finds the…

alex
- 1,905
- 26
- 51
0
votes
1 answer
Learning rate decay wrt to cumulative reward?
In deep reinforcement learning, is there any way to decay learning rate wrt to cumulative reward. I mean, decay learning rate when the agent is able to learn and maximize the reward?

M. Awais Jadoon
- 35
- 10
0
votes
1 answer
Prioritized Experience Replay for stochastic environment
I tried to use the following paper to improve the learning of my agent https://arxiv.org/pdf/1511.05952.pdf
While it seems to work very well on deterministic environment, I feel like it would actually make it worse in a stochastic one.
Let's assume…

user3548298
- 186
- 1
- 1
- 13
0
votes
1 answer
Is it okay to remove most oldest experiences of DQN
I have created a DQN with a max memory size of 100000. I have a function that removes the oldest element in the memory if its size is greater than the max size. When I ran it doing 200 episodes, I noticed that the memory was already full at the…

KKK
- 507
- 3
- 12
0
votes
1 answer
q agent is learning not to take any actions
I'm training a deep q network to trade stocks; it has two possible actions; 0 : wait, 1 : buy stock if one isn't bought, sell one if one is bought. It gets, as input, the value of the stock it bought, the current value of the stock and the values of…

RichKat
- 57
- 1
- 8
0
votes
1 answer
Is my DDQN network correctly implemented?
Here's my replay/train function implementation. I made the DDQN so that model lags behind model2 by 1 batch size during replay/training. By setting self.ddqn = False it becomes a normal DQN. Is this correctly implemented? I am using this paper as…

Linsu Han
- 135
- 1
- 8