Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

0 answers

Self-driving car not improving with Q-Learning

I'm working on a project where I'm trying to teach a car how to drive via Q-learning in Python. But I'm having a problem that it seems like the car never learn anyhing (Even after 1000000 Episodes) Since I really can't figure out where my problem…

asked Oct 14 '19 at 12:50

Sandmountain

votes

5 answers

How does DQN work in an environment where reward is always -1

Given that the OpenAI Gym environment MountainCar-v0 ALWAYS returns -1.0 as a reward (even when goal is achieved), I don't understand how DQN with experience-replay converges, yet I know it does, because I have working code that proves it. By…

machine-learning keras reinforcement-learning openai-gym q-learning

asked Jan 25 '19 at 18:53

keith gould

votes

1 answer

Reinforcement Learning with Keras model

I was trying to implement a q-learning algorithms in Keras. According to the articles i found these lines of code. for state, action, reward, next_state, done in sample_batch: target = reward if not done: #formula …

python keras deep-learning reinforcement-learning q-learning

asked Oct 27 '18 at 02:26

user9900027

votes

1 answer

First-Visit vs Every-Visit Monte Carlo

I have recently been looking into reinforcement learning. For this, I have been reading the famous book by Sutton, but there is something I do not fully understand yet. For Monte-Carlo learning, we can choose between first-visit and every-visit…

machine-learning reinforcement-learning markov-chains q-learning markov

asked Oct 16 '18 at 09:25

Djazouli

votes

1 answer

Updating table values live with Dash and Plotly

I am trying to build a dash app in Python to simulate a Q-Learning problem. Before implementing the algorithm I am just focusing on making the table work incrementing randomly the values and waiting 1 sec between each increment. Q is a pandas…

python plotly q-learning plotly-dash

asked Sep 07 '18 at 02:28

Pablo Ruiz Ruiz

votes

2 answers

Why and when is deep reinforcement learning needed instead of q-learning?

I've been studying reinforcement learning, and understand the concepts of value/policy iteration, TD(1)/TD(0)/TD(Lambda), and Q-learning. What I don't understand is why Q-learning can't be used for everything. Why do we need "deep" reinforcement…

machine-learning neural-network deep-learning reinforcement-learning q-learning

asked May 25 '18 at 13:39

Davia DeNisco

votes

1 answer

What is the difference between policy gradient methods and neural network-based action-value methods?

machine-learning artificial-intelligence reinforcement-learning q-learning

asked May 05 '18 at 12:57

Fcoder

9,066
17
63
100

votes

2 answers

Why do we need exploitation in RL(Q-Learning) for convergence?

I am implementing Q-learning algorithm and I observed that my Q-values are not converging to optimal Q-values even though the policy seems to be converging. I defined the action selection strategy as epsilon-greedy and epsilon is decreasing by 1/N…

reinforcement-learning q-learning convergence markov-decision-process

asked Mar 29 '18 at 02:52

Aybike

votes

1 answer

trouble implementing Breakout DeepMind's model

I am trying to follow DeepMind's paper on Q-learning for the game breakout, and so far the performance is not improving i.e. it is not learning anything at all. Instead of experience replay , i am just running game, saving some data and training and…

python reinforcement-learning q-learning openai-gym breakout

asked Mar 21 '18 at 15:02

Shubham Debnath

votes

2 answers

Q learning - epsilon greedy update

I am trying to understand the epsilon - greedy method in DQN. I am learning from the code available in https://github.com/karpathy/convnetjs/blob/master/build/deepqlearn.js Following is the update rule for epsilon which changes with age as…

performance neural-network deep-learning reinforcement-learning q-learning

asked Feb 02 '18 at 13:06

SKG

votes

1 answer

Why would a DQN give similar values to all actions in the action space (2) for all observations

I have a DQN algorithm that learns (the loss converges to 0) but unfortunately it learns a Q value function such that both of the Q values for each of the 2 possible actions are very similar. It is worth noting that the Q values change by very…

machine-learning computer-vision deep-learning keras q-learning

asked Aug 02 '17 at 13:42

MichaelAndroidNewbie

votes

1 answer

How do you update Q values for a two player game

For a single player game, Q-value updates are pretty intuitive. The current state and the future state depend on the strategy of a single player, but for two player this isn't the case. Consider the scenario where the opponent wins and the game is…

multiplayer reinforcement-learning q-learning

asked Apr 07 '17 at 22:11

Dipti Chaudhari

votes

0 answers

DQN on recommendation system

I want to use DQN on recommendation system for retail industry but the problem is, the state space of this question are time-inhomogeneous & not deterministic (compare to Atari games) I figure out two method for this problem make state-transition…

machine-learning deep-learning recommendation-engine q-learning

asked Oct 27 '16 at 09:30

tnlin

votes

1 answer

What is phi in Deep Q-learning algorithm

I'm trying to make a learning football game from scratch with Java and I'm trying to implement the reinforcement learning with Google DeepMind's Deep Q-learning algorithm (without convolutional network though). I've already built neural network and…

java machine-learning neural-network deep-learning q-learning

asked Oct 04 '16 at 09:39

Dope

votes

1 answer

Action selection with softmax?

I know this might be a pretty stupid question to ask, but what the hell.. I at the moment trying to implement soft max action selector, which uses the boltzmann distribution. Formula What I am bit unsure about, is how how do known if you want to…

c++ reinforcement-learning q-learning softmax

asked May 23 '16 at 22:14

Vato

Prev 1 2 3

…

29 30 Next