Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
2
votes
1 answer

Is Deep Q Learning appropriate for solving the Cartpole task?

I'm new to Reinforcement Learning. Recently, I've been trying to train a Deep Q Network to solve OpenAI gym's CartPole-v0 , where solving means achieving an average score of at least 195.0 over 100 consecutive episodes. I am using a 2 layer neural…
2
votes
0 answers

Debugging Deep Q-Learning CNN

I'm quite new to TF. I have written a Deep Q-Learning CNN to control a simple driving simulator. I've managed to plot the weights, biases and outputs of my fully connected layers on TensorBoard however, I'm not sure what I'm looking for. I've…
2
votes
1 answer

Minibatching in Stochastic Gradient Descent and in Q-Learning

Background (may be skipped): In training neural networks, usually stochastic gradient descent (SGD) is used: instead of computing the network's error on all members of the training set and updating the weights by gradient descent (which means…
Lior
  • 2,019
  • 1
  • 15
  • 22
2
votes
0 answers

Why are Q-Values of actions so close to each other in Deep Q-learning?

I'm trying to train a DRL agent to play a game using the DQN method. The game is pretty straight forward and similar to breakout. Fruits keep falling from the top of the screen (vertically) and the agent needs to just align itself to the fruit to…
2
votes
1 answer

Is this a correct implementation of Q-Learning for Checkers?

I am trying to understand Q-Learning, My current algorithm operates as follows: 1. A lookup table is maintained that maps a state to information about its immediate reward and utility for each action available. 2. At each state, check to see if it…
2
votes
0 answers

How to teach neural network a policy for a board game using reinforcement learning?

I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm. I'd like a neural net to have the following structure: layer - rows * cols + 1 neurons - input - values of…
2
votes
1 answer

PyBrains Q-Learning maze example. State values and the global policy

I am trying out the PyBrains maze example my setup is: envmatrix = [[...]] env = Maze(envmatrix, (1, 8)) task = MDPMazeTask(env) table = ActionValueTable(states_nr, actions_nr) table.initialize(0.) learner = Q() agent = LearningAgent(table,…
Boris Mocialov
  • 3,439
  • 2
  • 28
  • 55
2
votes
1 answer

Effect of different epsilon value for Q-learning and SARSA

Since i am a beginning in this field,I am having a doubt about the effect in between how does the different epsilon value will affect the SARSA and Qlearning with the epsilon greedy algorithm for action selection. I understand that when epsilon is…
2
votes
1 answer

python access dictionary that has two keys using only one key

I am currently working with Q learning and I have a dictionary Q[state, action] where each state can be anything i.e. string, number, list.. depending on the application. Each state has either 3 or 4 possible actions. For each state I need to find…
2
votes
1 answer

Solving GridWorld using Q-Learning and function approximation

I'm studying the simple GridWorld (3x4, as described in Russell & Norvig Ch. 21.2) problem; I've solved it using Q-Learning and a QTable, and now I'd like to use a function approximator instead of a matrix. I'm using MATLAB and have tried both…
2
votes
2 answers

Q Learning Grid World Scenario

I'm researching GridWorld from Q-learning Perspective. I have issues regarding the following question: 1) In the grid-world example, rewards are positive for goals, negative for running into the edge of the world, and zero the rest of the time. …
2
votes
1 answer

Q-learning implementation

I am trying to implement Q-learning, in an environment where R (rewards) are stochastich time-dependent variables, and they are arrive in real time, after const time interval deltaT. States S (scalars) also arrive after const time interval deltaT.…
user2981093
  • 45
  • 2
  • 8
2
votes
1 answer

Q-learning: What is the correct state for reward calculation

Q learning - rewards I'm struggling to interpret the pseudocode for the Q learning algorithm: 1 For each s, a initialize table entry Q(a, s) = 0 2 Observe current state s 3 Do forever: 4 Select an action a and execute it 5 Receive…
OccamsMan
  • 235
  • 1
  • 2
  • 7
2
votes
1 answer

Q Learning Algorithm Issue

I'm trying to do a simple Q learning algorithm, but for whatever reason it doesn't converge. The agent should basically get from one point on the 5x5 grid to the goal one. When I run it it seems to have found the most optimal way however it doesn't…
2
votes
2 answers

Q-learning (multiple goals)

i have just started to study Q-learning and see the possibilities of using Q-learning to solve my problem. Problem: I am supposed to detect a certain combination of data, i have four matrices that acts as an input to my system, i have already…