Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

1 answer

human trace data for evaluation of reinforcement learning agent playing Atari?

In recent reinforcement learning researches about Atari games, agents performance is evaluated by human start. [1507.04296] Massively Parallel Methods for Deep Reinforcement Learning [1509.06461] Deep Reinforcement Learning with Double…

reinforcement-learning q-learning

asked Jun 26 '18 at 06:43

keisuke

2,123
4
20
31

votes

2 answers

what should the Q matrix dimensions be in an open-like environment for Q-learning

I want to implement Q-learning in the Bipedal Walker v2 of OpenAI but after looking for tutorials, they seem to always be finite environment which make the Q matrix and reward matrix simple to initialize. e.g:…

python deep-learning reinforcement-learning q-learning openai-gym

asked Jun 21 '18 at 22:13

Tissuebox

1,016
3
14
36

votes

2 answers

Q-Learning equation in Deep Q Network

I'm new to reinforcement learning at all, so I may be wrong. My questions are: Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function? Is the equation recurrent? Assume I use DQN for, say,…

neural-network deep-learning artificial-intelligence reinforcement-learning q-learning

asked May 29 '18 at 09:45

anx199

votes

1 answer

How does neural network know which reward it got from action?

I am current working on making a Deep q-network and i a bit confused about how my Q-network knows which reward i give it. For example I have this state action function with policy and temporal difference: and then I have my Q-network: Where I…

neural-network deep-learning reinforcement-learning q-learning

asked Feb 23 '18 at 07:34

Søren Koch

votes

0 answers

Reinforcement Learning: Dynamic obstacles and dynamic goals

As far as I understand it's impossible for an agent to learn to avoid dynamic obstacles or to reach dynamic goals because after the training period the agent follows a static policy which describes what action to execute for each state. I have…

reinforcement-learning q-learning

asked Feb 09 '18 at 15:02

siva

1,183
3
12
28

votes

1 answer

Have 2 versions of the same TensorFlow network with different weights and update one from the other

I am trying to implement the deep q learning programs DeepMind used to train an AI to play Atari games. One of the features they use and is mentioned in multiple tutorials is to have two versions of your neural network; one to update as you cycle…

python tensorflow neural-network reinforcement-learning q-learning

asked Feb 01 '18 at 23:08

Usherwood

votes

1 answer

Reward value calculation: Q-Learning

I am currently working on optimizing reward values for the Q-Learning I'm doing. So right now I consider two values that calculate a specific reward value. Since this is work related i can't specify the variable names i take into consideration. the…

python mathematical-optimization reinforcement-learning q-learning reward-system

asked Jan 31 '18 at 06:55

pythonic_autometeor

votes

1 answer

Q-Learning Table converges to -inf

I tried to solve the aigym mountain-car problem with my own q-learning implementation. After trying around different things it started to work really good, but after a while (20k Episodes * 1000 Samples per Episode) I noticed that my the values…

python machine-learning reinforcement-learning q-learning

asked Jan 19 '18 at 13:36

greece57

votes

1 answer

Reinforcement Learning: Q and Q(λ) speed difference on Windy Grid World environment

Preface: I have attempted to solve this Windy-Grid-World env. Having implemented both Q and Q(λ) algorithm, the results are pretty much the same (I am looking at steps per episode). Problem: From what I have read, I believe that a higher lambda…

python lambda reinforcement-learning q-learning temporal-difference

asked Jan 07 '18 at 23:36

Vinh Vu

votes

0 answers

Multidimensional (7 dimensions) Array on Q-Learning State in Java

I'm coding a Q-Learning implementation for a game and the Q-Learning state requires a 7-dimensional array because I have everything about the game on it (player x, player y, monsters, treasures, possible moves, etc...) Everything adds up to more…

java arrays memory multidimensional-array q-learning

asked Dec 19 '17 at 18:01

Guilherme Oliveira

votes

1 answer

What is importance of reward policy in Reinforcement learninig?

We assign +1 reward for reaching goal and -1 for reaching an unwanted state. Is it necessary to give something like +0.01 reward for taking an action which reaches near to the goal and -0.01 reward for taking an action which does not ? What will the…

artificial-intelligence reinforcement-learning q-learning

asked Nov 06 '17 at 09:44

Jay Joshi

votes

2 answers

State representation for grid world

I'm new to reinforcement learning and q-learning and I'm trying to understand concepts and try to implement them. Most of material I have found use CNN layers to process image input. I think I would rather start with something simpler than than, so…

neural-network reinforcement-learning q-learning

asked Sep 04 '17 at 06:43

Adam Dohnal

votes

1 answer

Trading algorithm - actions in Q-learning/DQN

The following has completed using MATLAB. I am trying to build a trading algorithm using Deep Q learning. I have just taken a years worth of daily stock prices and am using that as the training set. My state space is my [money, stock, price] money…

reinforcement-learning quantitative-finance algorithmic-trading q-learning

asked Jun 06 '17 at 09:09

usman Farooq

votes

1 answer

deep q learning is not converging

I'm experimenting with deep q learning using Keras , and i want to teach an agent to perform a task . in my problem i wan't to teach an agent to avoid hitting objects in it's path by changing it's speed (accelerate or decelerate) the agent is …

tensorflow deep-learning keras keras-layer q-learning

asked Apr 17 '17 at 12:11

un famous

votes

1 answer

Is it feasibly to train an A3C algorithm in an episodic context?

The A3C Algorithm (and N-Step Q Learning) updates the globaly shared network once every N timesteps. N is usually pretty small, 5 or 20 as far as I remember. Wouldn't it be possible to set N to infinity, meaning that the networks are only trained at…

tensorflow deep-learning reinforcement-learning q-learning

asked Mar 18 '17 at 14:36

Another Coder

Prev 1 2 3

…

29 30 Next