Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
3
votes
1 answer

Q learning for ludo game?

I am at the moment trying to implement a AI player using Q-learning to play against 2 different random players.. I am not sure Q-learning is applicable for a ludo game, which why I am being bit doubtful about it.. I have for the game defined 11…
Lamda
  • 914
  • 3
  • 13
  • 39
3
votes
1 answer

Grid World representation for a neural network

I'm trying to come up with a better representation for the state of a 2-d grid world for a Q-learning algorithm which utilizes a neural network for the Q-function. In the tutorial, Q-learning with Neural Networks, the grid is represented as a 3-d…
Galen
  • 499
  • 5
  • 14
3
votes
1 answer

Adding constraints in Q-learning and assigning rewards if constraints are violated

I took an RL course recently and I am writing a Q-learning controller for a power management application where I have continuous states and discrete actions. I am using a neural network (Q-network) for approximation the action values and selecting…
3
votes
1 answer

Tensorflow implementation of loss of Q-network with slicing

I'm implementing a Q-network as described in Human-level control through deep reinforcement learning (Mnih et al. 2015) in TensorFlow. To approximate the Q-function they use a neural network. The Q-function maps a state and an action to a scalar…
3
votes
1 answer

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning. Right now I'm using the Kinect Gesture Builder program which uses Supervised…
3
votes
1 answer

Difference between batch q learning and growing batch q learning

I am confused about the difference between batch and growing batch q learning. Also, if I only have historical data, can I implement growing batch q learning? Thank you!
ChiefsCreation
  • 389
  • 1
  • 3
  • 10
3
votes
4 answers

Q learning: Relearning after changing the environment

I have implemented Q learning on a grid of size (n x n) with a single reward of 100 in the middle. The agent learns for 1000 epochs to reach the goal by the following agency: He chooses with probability 0.8 the move with the highest…
3
votes
1 answer

Is Q-Learning Algorithm's implementation recursive?

I am trying to implement the Q-Learning. The general algorithm from here is as below In the statement I just don't get it that should i implement the above statement of the original pseudo-code recursively for all next states which current…
dariush
  • 3,191
  • 3
  • 24
  • 43
3
votes
4 answers

is Q-learning without a final state even possible?

I have to solve this problem with Q-learning. Well, actually I have to evaluated a Q-learning based policy on it. I am a tourist manager. I have n hotels, each can contain a different number of persons. for each person I put in a hotel I get a…
3
votes
2 answers

Qlearning - Defining states and rewards

I need some help with solving a problem that uses the Q-learning algorithm. Problem description: I have a rocket simulator where the rocket is taking random paths and also crashes sometimes. The rocket has 3 different engines that can be either on…
mrjasmin
  • 1,230
  • 6
  • 21
  • 37
2
votes
1 answer

Can you limit the number of actions when using q learning?

I am currently implementing q learning to solve a maze which contains fires which initiate randomly. Would it be considered proper for me to code the action to not be an option for the agent if there is a fire in that direction or should my reward…
2
votes
2 answers

Q-table representation for nested lists as states and tuples as actions

How can I create a Q-table, when my states are lists and actions are tuples? Example of states for N = 3 [[1], [2], [3]] [[1], [2, 3]] [[1], [3, 2]] [[2], [3, 1]] [[1, 2, 3]] Example of actions for those states [[1], [2], [3]] -> (1, 2), (1, 3),…
John Doe
  • 21
  • 2
2
votes
0 answers

Why is my DQN (Deep Q Network) not learning?

I am training a DQN (Deep Q Network) on a CartPole problem from OpenAI's gym, but when I start the training, the total score from an episode decreases, instead of increasing. I don't know if it is helpful but I noticed that the AI prefers one action…
2
votes
2 answers

How to Learn the Reward Function in a Markov Decision Process

What's the appropriate way to update your R(s) function during Q-learning? For example, say an agent visits state s1 five times, and receives rewards [0,0,1,1,0]. Should I calculate the mean reward, e.g. R(s1) = sum([0,0,1,1,0])/5? Or should I use a…
Cerin
  • 60,957
  • 96
  • 316
  • 522
2
votes
1 answer

Target values to train against in Deep Q Network

I understand the whole gist of Q-learning and its update equation: Q(s, a) = r + \gamma * max_a' (Q(s', a')) where %s% is the current state, a is the action taken, r is the reward, s' is the next state as a result of the action, and we maximize…