Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
2
votes
1 answer

human trace data for evaluation of reinforcement learning agent playing Atari?

In recent reinforcement learning researches about Atari games, agents performance is evaluated by human start. [1507.04296] Massively Parallel Methods for Deep Reinforcement Learning [1509.06461] Deep Reinforcement Learning with Double…
keisuke
  • 2,123
  • 4
  • 20
  • 31
2
votes
2 answers

what should the Q matrix dimensions be in an open-like environment for Q-learning

I want to implement Q-learning in the Bipedal Walker v2 of OpenAI but after looking for tutorials, they seem to always be finite environment which make the Q matrix and reward matrix simple to initialize. e.g:…
2
votes
2 answers

Q-Learning equation in Deep Q Network

I'm new to reinforcement learning at all, so I may be wrong. My questions are: Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function? Is the equation recurrent? Assume I use DQN for, say,…
2
votes
1 answer

How does neural network know which reward it got from action?

I am current working on making a Deep q-network and i a bit confused about how my Q-network knows which reward i give it. For example I have this state action function with policy and temporal difference: and then I have my Q-network: Where I…
2
votes
0 answers

Reinforcement Learning: Dynamic obstacles and dynamic goals

As far as I understand it's impossible for an agent to learn to avoid dynamic obstacles or to reach dynamic goals because after the training period the agent follows a static policy which describes what action to execute for each state. I have…
siva
  • 1,183
  • 3
  • 12
  • 28
2
votes
1 answer

Have 2 versions of the same TensorFlow network with different weights and update one from the other

I am trying to implement the deep q learning programs DeepMind used to train an AI to play Atari games. One of the features they use and is mentioned in multiple tutorials is to have two versions of your neural network; one to update as you cycle…
2
votes
1 answer

Reward value calculation: Q-Learning

I am currently working on optimizing reward values for the Q-Learning I'm doing. So right now I consider two values that calculate a specific reward value. Since this is work related i can't specify the variable names i take into consideration. the…
2
votes
1 answer

Q-Learning Table converges to -inf

I tried to solve the aigym mountain-car problem with my own q-learning implementation. After trying around different things it started to work really good, but after a while (20k Episodes * 1000 Samples per Episode) I noticed that my the values…
greece57
  • 421
  • 1
  • 6
  • 16
2
votes
1 answer

Reinforcement Learning: Q and Q(λ) speed difference on Windy Grid World environment

Preface: I have attempted to solve this Windy-Grid-World env. Having implemented both Q and Q(λ) algorithm, the results are pretty much the same (I am looking at steps per episode). Problem: From what I have read, I believe that a higher lambda…
2
votes
0 answers

Multidimensional (7 dimensions) Array on Q-Learning State in Java

I'm coding a Q-Learning implementation for a game and the Q-Learning state requires a 7-dimensional array because I have everything about the game on it (player x, player y, monsters, treasures, possible moves, etc...) Everything adds up to more…
2
votes
1 answer

What is importance of reward policy in Reinforcement learninig?

We assign +1 reward for reaching goal and -1 for reaching an unwanted state. Is it necessary to give something like +0.01 reward for taking an action which reaches near to the goal and -0.01 reward for taking an action which does not ? What will the…
2
votes
2 answers

State representation for grid world

I'm new to reinforcement learning and q-learning and I'm trying to understand concepts and try to implement them. Most of material I have found use CNN layers to process image input. I think I would rather start with something simpler than than, so…
2
votes
1 answer

Trading algorithm - actions in Q-learning/DQN

The following has completed using MATLAB. I am trying to build a trading algorithm using Deep Q learning. I have just taken a years worth of daily stock prices and am using that as the training set. My state space is my [money, stock, price] money…
2
votes
1 answer

deep q learning is not converging

I'm experimenting with deep q learning using Keras , and i want to teach an agent to perform a task . in my problem i wan't to teach an agent to avoid hitting objects in it's path by changing it's speed (accelerate or decelerate) the agent is …
un famous
  • 35
  • 1
  • 11
2
votes
1 answer

Is it feasibly to train an A3C algorithm in an episodic context?

The A3C Algorithm (and N-Step Q Learning) updates the globaly shared network once every N timesteps. N is usually pretty small, 5 or 20 as far as I remember. Wouldn't it be possible to set N to infinity, meaning that the networks are only trained at…