Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

1 answer

Is Deep Q Learning appropriate for solving the Cartpole task?

I'm new to Reinforcement Learning. Recently, I've been trying to train a Deep Q Network to solve OpenAI gym's CartPole-v0 , where solving means achieving an average score of at least 195.0 over 100 consecutive episodes. I am using a 2 layer neural…

asked Mar 13 '17 at 07:15

W. Hawk

votes

0 answers

Debugging Deep Q-Learning CNN

I'm quite new to TF. I have written a Deep Q-Learning CNN to control a simple driving simulator. I've managed to plot the weights, biases and outputs of my fully connected layers on TensorBoard however, I'm not sure what I'm looking for. I've…

tensorflow machine-learning deep-learning reinforcement-learning q-learning

asked Feb 01 '17 at 09:45

Joshua Patterson

votes

1 answer

Minibatching in Stochastic Gradient Descent and in Q-Learning

Background (may be skipped): In training neural networks, usually stochastic gradient descent (SGD) is used: instead of computing the network's error on all members of the training set and updating the weights by gradient descent (which means…

machine-learning neural-network reinforcement-learning q-learning

asked Dec 24 '16 at 21:41

Lior

2,019
1
15
22

votes

0 answers

Why are Q-Values of actions so close to each other in Deep Q-learning?

I'm trying to train a DRL agent to play a game using the DQN method. The game is pretty straight forward and similar to breakout. Fruits keep falling from the top of the screen (vertically) and the agent needs to just align itself to the fruit to…

deep-learning reinforcement-learning q-learning

asked Oct 19 '16 at 07:29

Srinivas K

votes

1 answer

Is this a correct implementation of Q-Learning for Checkers?

I am trying to understand Q-Learning, My current algorithm operates as follows: 1. A lookup table is maintained that maps a state to information about its immediate reward and utility for each action available. 2. At each state, check to see if it…

machine-learning pseudocode agent reinforcement-learning q-learning

asked Apr 24 '16 at 12:14

Samalot

votes

0 answers

How to teach neural network a policy for a board game using reinforcement learning?

I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm. I'd like a neural net to have the following structure: layer - rows * cols + 1 neurons - input - values of…

algorithm machine-learning neural-network reinforcement-learning q-learning

asked Jan 05 '16 at 11:01

Luke

1,369
1
13
37

votes

1 answer

PyBrains Q-Learning maze example. State values and the global policy

I am trying out the PyBrains maze example my setup is: envmatrix = [[...]] env = Maze(envmatrix, (1, 8)) task = MDPMazeTask(env) table = ActionValueTable(states_nr, actions_nr) table.initialize(0.) learner = Q() agent = LearningAgent(table,…

python pybrain reinforcement-learning q-learning mdp

asked Nov 28 '15 at 23:56

Boris Mocialov

3,439
2
28
55

votes

1 answer

Effect of different epsilon value for Q-learning and SARSA

Since i am a beginning in this field,I am having a doubt about the effect in between how does the different epsilon value will affect the SARSA and Qlearning with the epsilon greedy algorithm for action selection. I understand that when epsilon is…

machine-learning artificial-intelligence epsilon q-learning sarsa

asked Nov 17 '15 at 03:19

user3064688

votes

1 answer

python access dictionary that has two keys using only one key

I am currently working with Q learning and I have a dictionary Q[state, action] where each state can be anything i.e. string, number, list.. depending on the application. Each state has either 3 or 4 possible actions. For each state I need to find…

python dictionary machine-learning q-learning

asked Oct 31 '15 at 16:53

Arnas Ivanavičius

votes

1 answer

Solving GridWorld using Q-Learning and function approximation

I'm studying the simple GridWorld (3x4, as described in Russell & Norvig Ch. 21.2) problem; I've solved it using Q-Learning and a QTable, and now I'd like to use a function approximator instead of a matrix. I'm using MATLAB and have tried both…

neural-network decision-tree reinforcement-learning q-learning function-approximation

asked Jul 18 '15 at 18:42

okh

votes

2 answers

Q Learning Grid World Scenario

I'm researching GridWorld from Q-learning Perspective. I have issues regarding the following question: 1) In the grid-world example, rewards are positive for goals, negative for running into the edge of the world, and zero the rest of the time. …

machine-learning reinforcement-learning gridworld q-learning

asked Apr 11 '15 at 05:26

trivikram srinivas

votes

1 answer

Q-learning implementation

I am trying to implement Q-learning, in an environment where R (rewards) are stochastich time-dependent variables, and they are arrive in real time, after const time interval deltaT. States S (scalars) also arrive after const time interval deltaT.…

reinforcement-learning q-learning

asked Apr 09 '15 at 11:30

user2981093

votes

1 answer

Q-learning: What is the correct state for reward calculation

Q learning - rewards I'm struggling to interpret the pseudocode for the Q learning algorithm: 1 For each s, a initialize table entry Q(a, s) = 0 2 Observe current state s 3 Do forever: 4 Select an action a and execute it 5 Receive…

reinforcement-learning q-learning

asked Apr 02 '14 at 08:13

OccamsMan

votes

1 answer

Q Learning Algorithm Issue

I'm trying to do a simple Q learning algorithm, but for whatever reason it doesn't converge. The agent should basically get from one point on the 5x5 grid to the goal one. When I run it it seems to have found the most optimal way however it doesn't…

algorithm matlab reinforcement-learning q-learning temporal-difference

asked Mar 20 '14 at 15:11

sbarkar

votes

2 answers

Q-learning (multiple goals)

i have just started to study Q-learning and see the possibilities of using Q-learning to solve my problem. Problem: I am supposed to detect a certain combination of data, i have four matrices that acts as an input to my system, i have already…

machine-learning artificial-intelligence reinforcement-learning reward q-learning

asked Nov 14 '13 at 22:37

user2994193

Prev 1 2 3

…

29 30 Next