Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

3 answers

Deep Reinforcement Learning - CartPole Problem

I tried to implement the most simple Deep Q Learning algorithm. I think, I've implemented it right and know that Deep Q Learning struggles with divergences but the reward is declining very fast and the loss is diverging. I would be grateful if…

asked May 25 '21 at 16:55

EnTDeS

votes

4 answers

State dependent action set in reinforcement learning

How do people deal with problems where the legal actions in different states are different? In my case I have about 10 actions total, the legal actions are not overlapping, meaning that in certain states, the same 3 states are always legal, and…

machine-learning reinforcement-learning q-learning

asked Apr 25 '18 at 00:07

Edmonds Karp

votes

1 answer

Why is there no n-step Q-learning algorithm in Sutton's RL book?

I think I am messing something up. I always thought that: - 1-step TD on-policy = Sarsa - 1-step TD off-policy = Q-learning Thus I conclude: - n-step TD on-policy = n-step Sarsa - n-step TD off-policy = n-step Q-learning In Sutton's book,…

reinforcement-learning q-learning sarsa

asked Apr 13 '18 at 17:10

siva

1,183
3
12
28

votes

1 answer

Are off-policy learning methods better than on-policy methods?

I cannot understand what the fundamental difference between on-policy methods (like A3C) and off-policy methods (like DDPG) is. As far as I know, off-policy methods can learn the optimal policy regardless of the behavior policy. It can learn by…

reinforcement-learning q-learning

asked Mar 05 '17 at 09:22

DarkZero

2,259
3
25
36

votes

2 answers

How to understand Watkins's Q(λ) learning algorithm in Sutton&Barto's RL book?

In Sutton&Barto's RL book (link), the Watkins's Q(λ) learning algorithm presented in Figure 7.14: Line 10 "For all s, a:", the "s,a" here is for all the (s,a), while the (s,a) in line 8 and line 9 is for the current (s,a), is this right? In line 12…

reinforcement-learning q-learning

asked Nov 29 '16 at 09:47

user186199

votes

2 answers

Q-learning using neural networks

I'm trying to implement the Deep q-learning algorithm for a pong game. I've already implemented Q-learning using a table as Q-function. It works very well and learns how to beat the naive AI within 10 minutes. But I can't make it work using neural…

neural-network artificial-intelligence deep-learning encog q-learning

asked Sep 26 '16 at 00:44

SilverTear

votes

2 answers

Q-Learning values get too high

I've recently made an attempt to implement a basic Q-Learning algorithm in Golang. Note that I'm new to Reinforcement Learning and AI in general, so the error may very well be mine. Here's how I implemented the solution to an m,n,k-game…

go floating-point reinforcement-learning q-learning

asked May 30 '16 at 11:24

Fardin K.

votes

1 answer

Q-Learning convergence to optimal policy

I am using rlglue based python-rl framework for q-learning. My understanding is that over number of episodes, the algorithm converges to an optimal policy (which is a mapping which says what action to take in what state). Question1: Does this mean…

reinforcement-learning q-learning

asked Apr 15 '14 at 08:50

okkhoy

1,298
3
16
29

votes

1 answer

Q-learning in a neural network - Mountain Car

So I've been reading about Q-learning and Neural networks. I believe I have the right idea for it however I would like to have a second opinion on my code for NN and updating with Q-values. I have created a MatLab implementation of the Mountain Car…

matlab machine-learning artificial-intelligence neural-network q-learning

asked Aug 13 '13 at 14:54

Sevren

votes

1 answer

SARSA algorithm for average reward problems

My question is about using the SARSA algorithm in reinforcement learning for an undiscounted, continuing (non-episodic) problem (can it be used for such a problem?) I have been studying the textbook by Sutton and Barto, and they show how to modify…

artificial-intelligence reinforcement-learning q-learning

asked Mar 29 '13 at 02:56

user2223057

votes

1 answer

DDPG not converging for a simple control problem

I am trying to solve a control problem with DDPG. The problem is simple enough so that I can do value function iteration for its discretized version, and thus I have the "perfect" solution to compare my results with. But I want to solve the problem…

deep-learning reinforcement-learning q-learning policy-gradient-descent

asked Jan 31 '21 at 22:13

Hypsoline

votes

1 answer

How do I calculate MaxQ in Q-learning?

I'making a implementation of Q-learning, specifically the Bellman equation. I'm using the version from a website that guides he through the problem, but I have question: For maxQ, do I calculate the max reward using all Q-table values of the new…

c++ reinforcement-learning q-learning

asked Oct 20 '19 at 13:32

user11105005

votes

1 answer

OpenAI gym render OSError

I am trying to learn Q-Learning by using OpenAI's gym module. But when I try to render my environment, I get the following error, OSError Traceback (most recent call last) in…

python python-3.x pyglet openai-gym q-learning

asked May 31 '19 at 09:21

Vinay Bharadhwaj

votes

1 answer

How do apply Q-learning to an OpenAI-gym environment where multiple actions are taken at each time step?

I have successfully used Q-learning to solve some classic reinforcement learning environments from OpenAI Gym (i.e. Taxi, CartPole). These environments allow for a single action to be taken at each time step. However I cannot find a way to solve…

python reinforcement-learning openai-gym q-learning

asked Apr 05 '19 at 16:28

Pierre

votes

1 answer

Deep Q Learning For Snake Game

I'm working on a project base on Keras Plays Catch code. I have changed the game to a simple Snake game and I represent the snake a dot on the board for the sake of simplicity. If Snake ate the reward it will get +5 score and For hitting wall it…

python keras deep-learning reinforcement-learning q-learning

asked Jan 22 '19 at 06:05

Amir_P

8,322
5
43
92

Prev 1 2

…

29 30 Next