Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
1
vote
1 answer

How to efficiently update probabilities within an EnumeratedDistribution instance?

Question Summary Is there any way of updating the probabilities within an existing instance of the class EnumeratedIntegerDistribution without creating an entirely new instance? Background I'm trying to implement a simplified Q-learning style…
topher217
  • 1,188
  • 12
  • 35
1
vote
1 answer

Bad Performance of a double DQN in comparison to the vanilla DQN

I am right now trying to to optimize the navigation of my robot. I first used a vanilla DQN where I optimized the parameters. The simulated robot made it to reach 8000 goals after 5000 Episodes and showed a satisfying learning performance. Now as…
trello123
  • 15
  • 4
1
vote
0 answers

How to deploy Q-learning model?

I am trying to get familiar with Reinforcement Learning. I created a RL using Q-learning approach. Description of the problem I have a set of customers and each of them have the following features [price, category, cluster] - these customers…
1
vote
1 answer

Q-learning to learn minesweeping behavior

I am attempting to use Q-learning to learn minesweeping behavior on a discreet version of Mat Buckland's smart sweepers, the original available here http://www.ai-junkie.com/ann/evolved/nnt1.html, for an assignment. The assignment limits us to 50…
1
vote
0 answers

Does Keras Adam Optimizer and other momemtum-based optimizers retain its past update information over different fit calls?

The Adam optimizer uses a momentum-like approach to train a neural network in fewer iterations of gradient descent than vanilla gradient descent. I'm trying to figure out whether the Adam optimizer works in a Q-learning situation where you have a…
1
vote
1 answer

Can I design a non-deterministic reward function in Q-learning?

In the Q-learning algorithm, there is a reward function that rewards the action taken on the current state. My question is can I have a non-deterministic reward function that is affected by the time when an action on a state is performed. For…
Richard Hu
  • 811
  • 5
  • 18
1
vote
1 answer

Keras Q-learning model performance doesn't improve when playing CartPole

I'm trying to train a deep Q-learning Keras model to play CartPole-v1. However, it doesn't seem to get any better. I don't believe it's a bug but rather my lack of knowledge on how to use Keras and OpenAI Gym properly. I am following this tutorial…
1
vote
1 answer

How can I change this to use a q table for reinforcement learning

I am working on learning q-tables and ran through a simple version which only used a 1-dimensional array to move forward and backward. now I am trying 4 direction movement and got stuck on controlling the person. I got the random movement down now…
MNM
  • 2,673
  • 6
  • 38
  • 73
1
vote
1 answer

Epsilon-greedy algorithm

I understand epsilon-greedy algorithm, but there is one point of confusion. Is it average reward or value that it keeps track of? Most of the time, it is explained in the context of multi-armed bandit. However, there is no distinction of reward /…
AgnosticCucumber
  • 616
  • 1
  • 7
  • 21
1
vote
1 answer

Model free or model based deep reinforcement learning for car racing?

I'm new in the field of reinforcement learning. So I'm quite confused with "model based" or "model free" terms. For example, in a video game, if I want to train an agent (a car) to drive on a racetrack. If my input is a 256x256x3 first person image…
1
vote
1 answer

How to teach game rule to ai?

I am making AI like alpha GO using DQN. BUT i am in trouble with teaching game rules. AI doesn't know a rule that 'must not put stone into a place that is already occupied' in the first time. I tried to give minus reward whenever AI violates that…
1
vote
1 answer

How to select the action with highest Q value

I have implemented DQN with experience replay.Input is 50x50x1. With a batch size of 4, input would become (4,50,50,1). Total output actions are 10. If batch size is 4, output would be (4,10). I want to know how would i select the max q-value out of…
1
vote
1 answer

Is it necessary to end episodes when collision occurs in reinforcement learning

I have implemented q learning algorithm in which the agent tries to travel as far as possible. I am using instantaneous rewards and final episode reward as well. When agent collides, i am giving high collision reward in negative and I am not…
1
vote
1 answer

Network trains well on a grid of shape N but when evaluating on any variation fails

For training I randomly generate a grid of shape N contaning values 0 and 1. There are two actions defined [0,1] and I want to teach a policy using DQN to take action of 0 when the next number is 1 and take action 1 when next number in the array is…
1
vote
1 answer

Algorithm for subdivision of 3D surfaces

Background I have a 3D scene, and I want to discretize the space of it so that every coordinate (x, y, z) belongs to a specific cell. Coordinates close to each other belongs to same cells. When I input a coordinate that lies on the surface of one…
maurock
  • 527
  • 1
  • 7
  • 22