Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
3
votes
0 answers

Self-driving car not improving with Q-Learning

I'm working on a project where I'm trying to teach a car how to drive via Q-learning in Python. But I'm having a problem that it seems like the car never learn anyhing (Even after 1000000 Episodes) Since I really can't figure out where my problem…
3
votes
5 answers

How does DQN work in an environment where reward is always -1

Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code that proves it. By…
3
votes
1 answer

Reinforcement Learning with Keras model

I was trying to implement a q-learning algorithms in Keras. According to the articles i found these lines of code. for state, action, reward, next_state, done in sample_batch: target = reward if not done: #formula …
user9900027
3
votes
1 answer

First-Visit vs Every-Visit Monte Carlo

I have recently been looking into reinforcement learning. For this, I have been reading the famous book by Sutton, but there is something I do not fully understand yet. For Monte-Carlo learning, we can choose between first-visit and every-visit…
3
votes
1 answer

Updating table values live with Dash and Plotly

I am trying to build a dash app in Python to simulate a Q-Learning problem. Before implementing the algorithm I am just focusing on making the table work incrementing randomly the values and waiting 1 sec between each increment. Q is a pandas…
Pablo Ruiz Ruiz
  • 605
  • 1
  • 6
  • 23
3
votes
2 answers

Why and when is deep reinforcement learning needed instead of q-learning?

I've been studying reinforcement learning, and understand the concepts of value/policy iteration, TD(1)/TD(0)/TD(Lambda), and Q-learning. What I don't understand is why Q-learning can't be used for everything. Why do we need "deep" reinforcement…
3
votes
1 answer

What is the difference between policy gradient methods and neural network-based action-value methods?

What is the difference between policy gradient methods and neural network-based action-value methods?
3
votes
2 answers

Why do we need exploitation in RL(Q-Learning) for convergence?

I am implementing Q-learning algorithm and I observed that my Q-values are not converging to optimal Q-values even though the policy seems to be converging. I defined the action selection strategy as epsilon-greedy and epsilon is decreasing by 1/N…
3
votes
1 answer

trouble implementing Breakout DeepMind's model

I am trying to follow DeepMind's paper on Q-learning for the game breakout, and so far the performance is not improving i.e. it is not learning anything at all. Instead of experience replay , i am just running game, saving some data and training and…
3
votes
2 answers

Q learning - epsilon greedy update

I am trying to understand the epsilon - greedy method in DQN. I am learning from the code available in https://github.com/karpathy/convnetjs/blob/master/build/deepqlearn.js Following is the update rule for epsilon which changes with age as…
3
votes
1 answer

Why would a DQN give similar values to all actions in the action space (2) for all observations

I have a DQN algorithm that learns (the loss converges to 0) but unfortunately it learns a Q value function such that both of the Q values for each of the 2 possible actions are very similar. It is worth noting that the Q values change by very…
3
votes
1 answer

How do you update Q values for a two player game

For a single player game, Q-value updates are pretty intuitive. The current state and the future state depend on the strategy of a single player, but for two player this isn't the case. Consider the scenario where the opponent wins and the game is…
3
votes
0 answers

DQN on recommendation system

I want to use DQN on recommendation system for retail industry but the problem is, the state space of this question are time-inhomogeneous & not deterministic (compare to Atari games) I figure out two method for this problem make state-transition…
3
votes
1 answer

What is phi in Deep Q-learning algorithm

I'm trying to make a learning football game from scratch with Java and I'm trying to implement the reinforcement learning with Google DeepMind's Deep Q-learning algorithm (without convolutional network though). I've already built neural network and…
3
votes
1 answer

Action selection with softmax?

I know this might be a pretty stupid question to ask, but what the hell.. I at the moment trying to implement soft max action selector, which uses the boltzmann distribution. Formula What I am bit unsure about, is how how do known if you want to…
Vato
  • 37
  • 1
  • 8