Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
10
votes
2 answers

Q-learning in game not working as expected

I have attempted to implement Q-learning in to a simple game I have written. The game is based around the player having to "jump" to avoid oncoming boxes. I have designed the system with two actions; jump and do_nothing and the states are the…
Jack Wilsdon
  • 6,706
  • 11
  • 44
  • 87
9
votes
2 answers

Q Learning Algorithm for Tic Tac Toe

I could not understand how to update Q values for tic tac toe game. I read all about that but I could not imagine how to do this. I read that Q value is updated end of the game, but I haven't understand that if there is Q value for each action ?
8
votes
2 answers

Q Learning Applied To a Two Player Game

I am trying to implement a Q Learning agent to learn an optimal policy for playing against a random agent in a game of Tic Tac Toe. I have created a plan that I believe will work. There is just one part that I cannot get my head around. And this…
Frederick
  • 115
  • 1
  • 10
8
votes
1 answer

Questions about Q-Learning using Neural Networks

I have implemented Q-Learning as described in, http://web.cs.swarthmore.edu/~meeden/cs81/s12/papers/MarkStevePaper.pdf In order to approx. Q(S,A) I use a neural network structure like the following, Activation sigmoid Inputs, number of inputs + 1…
7
votes
2 answers

Criteria for convergence in Q-learning

I am experimenting with the Q-learning algorithm. I have read from different sources and understood the algorithm, however, there seem to be no clear convergence criteria that is mathematically backed. Most sources recommend iterating several times…
7
votes
0 answers

Why does my agent always takes a same action in DQN - Reinforcement Learning

I have trained an RL agent using DQN algorithm. After 20000 episodes my rewards are converged. Now when I test this agent, the agent is always taking the same action , irrespective of state. I find this very weird. Can someone help me with this. Is…
chink
  • 1,505
  • 3
  • 28
  • 70
7
votes
2 answers

Deep Q Network is not learning

I tried to code a Deep Q Network to play Atari games using Tensorflow and OpenAI's Gym. Here's my code: import tensorflow as tf import gym import numpy as np import os env_name = 'Breakout-v0' env = gym.make(env_name) num_episodes = 100 input_data…
7
votes
2 answers

RL Activation Functions with Negative Rewards

I have a question regarding appropriate activation functions with environments that have both positive and negative rewards. In reinforcement learning, our output, I believe, should be the expected reward for all possible actions. Since some options…
7
votes
1 answer

list index out of range error using random.choice

I'm getting the error below when I run my program, which has the function defined below in it. I think it's the valid_actions = filter(lambda x: x != random.choice(maxQactions) part that's causing the error. Does anyone see what the issue is, or…
user3476463
  • 3,967
  • 22
  • 57
  • 117
7
votes
1 answer

How to implement q-learning in R?

I am learning about q-learning and found a Wikipedia post and this website. According to the tutorials and pseudo code I wrote this much in R #q-learning…
Eka
  • 14,170
  • 38
  • 128
  • 212
6
votes
1 answer

DQN Pytorch Loss keeps increasing

I am implementing simple DQN algorithm using pytorch, to solve the CartPole environment from gym. I have been debugging for a while now, and I cant figure out why the model is not learning. Observations: using SmoothL1Loss performs worse than…
6
votes
1 answer

Something wrong with Keras code Q-learning OpenAI gym FrozenLake

Maybe my question will seem stupid. I'm studying the Q-learning algorithm. In order to better understand it, I'm trying to remake the Tenzorflow code of this FrozenLake example into the Keras code. My code: import gym import numpy as np import…
6
votes
3 answers

Learning rate of a Q learning agent

The question how the learning rate influences the convergence rate and convergence itself. If the learning rate is constant, will Q function converge to the optimal on or learning rate should necessarily decay to guarantee convergence?
uduck
  • 149
  • 1
  • 1
  • 8
6
votes
1 answer

Implementing reinforcement learning in NetLogo (Learning in multi-agent models)

I am thinking to implement a learning strategy for different types of agents in my model. To be honest, I still do not know what kind of questions should I ask first or where to start. I have two types of agents which I want them to learn by…
6
votes
3 answers

Unbounded increase in Q-Value, consequence of recurrent reward after repeating the same action in Q-Learning

I'm in the process of development of a simple Q-Learning implementation over a trivial application, but there's something that keeps puzzling me. Let's consider the standard formulation of Q-Learning Q(S, A) = Q(S, A) + alpha * [R + MaxQ(S', A') -…
1
2
3
29 30