Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
5
votes
3 answers

Deep Reinforcement Learning - CartPole Problem

I tried to implement the most simple Deep Q Learning algorithm. I think, I've implemented it right and know that Deep Q Learning struggles with divergences but the reward is declining very fast and the loss is diverging. I would be grateful if…
5
votes
4 answers

State dependent action set in reinforcement learning

How do people deal with problems where the legal actions in different states are different? In my case I have about 10 actions total, the legal actions are not overlapping, meaning that in certain states, the same 3 states are always legal, and…
5
votes
1 answer

Why is there no n-step Q-learning algorithm in Sutton's RL book?

I think I am messing something up. I always thought that: - 1-step TD on-policy = Sarsa - 1-step TD off-policy = Q-learning Thus I conclude: - n-step TD on-policy = n-step Sarsa - n-step TD off-policy = n-step Q-learning In Sutton's book,…
siva
  • 1,183
  • 3
  • 12
  • 28
5
votes
1 answer

Are off-policy learning methods better than on-policy methods?

I cannot understand what the fundamental difference between on-policy methods (like A3C) and off-policy methods (like DDPG) is. As far as I know, off-policy methods can learn the optimal policy regardless of the behavior policy. It can learn by…
DarkZero
  • 2,259
  • 3
  • 25
  • 36
5
votes
2 answers

How to understand Watkins's Q(λ) learning algorithm in Sutton&Barto's RL book?

In Sutton&Barto's RL book (link), the Watkins's Q(λ) learning algorithm presented in Figure 7.14: Line 10 "For all s, a:", the "s,a" here is for all the (s,a), while the (s,a) in line 8 and line 9 is for the current (s,a), is this right? In line 12…
user186199
  • 115
  • 2
  • 7
5
votes
2 answers

Q-learning using neural networks

I'm trying to implement the Deep q-learning algorithm for a pong game. I've already implemented Q-learning using a table as Q-function. It works very well and learns how to beat the naive AI within 10 minutes. But I can't make it work using neural…
5
votes
2 answers

Q-Learning values get too high

I've recently made an attempt to implement a basic Q-Learning algorithm in Golang. Note that I'm new to Reinforcement Learning and AI in general, so the error may very well be mine. Here's how I implemented the solution to an m,n,k-game…
Fardin K.
  • 445
  • 9
  • 19
5
votes
1 answer

Q-Learning convergence to optimal policy

I am using rlglue based python-rl framework for q-learning. My understanding is that over number of episodes, the algorithm converges to an optimal policy (which is a mapping which says what action to take in what state). Question1: Does this mean…
okkhoy
  • 1,298
  • 3
  • 16
  • 29
5
votes
1 answer

Q-learning in a neural network - Mountain Car

So I've been reading about Q-learning and Neural networks. I believe I have the right idea for it however I would like to have a second opinion on my code for NN and updating with Q-values. I have created a MatLab implementation of the Mountain Car…
5
votes
1 answer

SARSA algorithm for average reward problems

My question is about using the SARSA algorithm in reinforcement learning for an undiscounted, continuing (non-episodic) problem (can it be used for such a problem?) I have been studying the textbook by Sutton and Barto, and they show how to modify…
4
votes
1 answer

DDPG not converging for a simple control problem

I am trying to solve a control problem with DDPG. The problem is simple enough so that I can do value function iteration for its discretized version, and thus I have the "perfect" solution to compare my results with. But I want to solve the problem…
4
votes
1 answer

How do I calculate MaxQ in Q-learning?

I'making a implementation of Q-learning, specifically the Bellman equation. I'm using the version from a website that guides he through the problem, but I have question: For maxQ, do I calculate the max reward using all Q-table values of the new…
user11105005
4
votes
1 answer

OpenAI gym render OSError

I am trying to learn Q-Learning by using OpenAI's gym module. But when I try to render my environment, I get the following error, OSError Traceback (most recent call last) in…
Vinay Bharadhwaj
  • 165
  • 1
  • 17
4
votes
1 answer

How do apply Q-learning to an OpenAI-gym environment where multiple actions are taken at each time step?

I have successfully used Q-learning to solve some classic reinforcement learning environments from OpenAI Gym (i.e. Taxi, CartPole). These environments allow for a single action to be taken at each time step. However I cannot find a way to solve…
4
votes
1 answer

Deep Q Learning For Snake Game

I'm working on a project base on Keras Plays Catch code. I have changed the game to a simple Snake game and I represent the snake a dot on the board for the sake of simplicity. If Snake ate the reward it will get +5 score and For hitting wall it…
Amir_P
  • 8,322
  • 5
  • 43
  • 92
1 2
3
29 30