Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

vote

1 answer

How to efficiently update probabilities within an EnumeratedDistribution instance?

Question Summary Is there any way of updating the probabilities within an existing instance of the class EnumeratedIntegerDistribution without creating an entirely new instance? Background I'm trying to implement a simplified Q-learning style…

asked Nov 11 '19 at 06:35

topher217

1,188
12
35

vote

1 answer

Bad Performance of a double DQN in comparison to the vanilla DQN

I am right now trying to to optimize the navigation of my robot. I first used a vanilla DQN where I optimized the parameters. The simulated robot made it to reach 8000 goals after 5000 Episodes and showed a satisfying learning performance. Now as…

python machine-learning q-learning

asked Nov 01 '19 at 10:02

trello123

vote

0 answers

How to deploy Q-learning model?

I am trying to get familiar with Reinforcement Learning. I created a RL using Q-learning approach. Description of the problem I have a set of customers and each of them have the following features [price, category, cluster] - these customers…

python deployment q-learning

asked Oct 17 '19 at 14:23

Randy Morrison

vote

1 answer

Q-learning to learn minesweeping behavior

I am attempting to use Q-learning to learn minesweeping behavior on a discreet version of Mat Buckland's smart sweepers, the original available here http://www.ai-junkie.com/ann/evolved/nnt1.html, for an assignment. The assignment limits us to 50…

c++ q-learning

asked Oct 09 '19 at 16:01

Rhett Flanagan

vote

0 answers

Does Keras Adam Optimizer and other momemtum-based optimizers retain its past update information over different fit calls?

The Adam optimizer uses a momentum-like approach to train a neural network in fewer iterations of gradient descent than vanilla gradient descent. I'm trying to figure out whether the Adam optimizer works in a Q-learning situation where you have a…

optimization keras reinforcement-learning q-learning adam

asked Aug 29 '19 at 08:52

Arjan Groen

vote

1 answer

Can I design a non-deterministic reward function in Q-learning?

In the Q-learning algorithm, there is a reward function that rewards the action taken on the current state. My question is can I have a non-deterministic reward function that is affected by the time when an action on a state is performed. For…

reinforcement-learning q-learning

asked Aug 25 '19 at 09:22

Richard Hu

vote

1 answer

Keras Q-learning model performance doesn't improve when playing CartPole

I'm trying to train a deep Q-learning Keras model to play CartPole-v1. However, it doesn't seem to get any better. I don't believe it's a bug but rather my lack of knowledge on how to use Keras and OpenAI Gym properly. I am following this tutorial…

python keras reinforcement-learning openai-gym q-learning

asked Jun 29 '19 at 10:19

toenails_sauce

vote

1 answer

How can I change this to use a q table for reinforcement learning

I am working on learning q-tables and ran through a simple version which only used a 1-dimensional array to move forward and backward. now I am trying 4 direction movement and got stuck on controlling the person. I got the random movement down now…

python artificial-intelligence reinforcement-learning q-learning

asked Jun 21 '19 at 06:24

MNM

2,673
6
38
73

vote

1 answer

Epsilon-greedy algorithm

I understand epsilon-greedy algorithm, but there is one point of confusion. Is it average reward or value that it keeps track of? Most of the time, it is explained in the context of multi-armed bandit. However, there is no distinction of reward /…

reinforcement-learning q-learning

asked Jun 18 '19 at 02:42

AgnosticCucumber

vote

1 answer

Model free or model based deep reinforcement learning for car racing?

I'm new in the field of reinforcement learning. So I'm quite confused with "model based" or "model free" terms. For example, in a video game, if I want to train an agent (a car) to drive on a racetrack. If my input is a 256x256x3 first person image…

reinforcement-learning q-learning

asked May 28 '19 at 08:34

antoine Mathu

vote

1 answer

How to teach game rule to ai?

I am making AI like alpha GO using DQN. BUT i am in trouble with teaching game rules. AI doesn't know a rule that 'must not put stone into a place that is already occupied' in the first time. I tried to give minus reward whenever AI violates that…

deep-learning reinforcement-learning q-learning

asked May 16 '19 at 11:05

장영연

vote

1 answer

How to select the action with highest Q value

I have implemented DQN with experience replay.Input is 50x50x1. With a batch size of 4, input would become (4,50,50,1). Total output actions are 10. If batch size is 4, output would be (4,10). I want to know how would i select the max q-value out of…

deep-learning action reinforcement-learning q-learning

asked Apr 22 '19 at 19:46

elemecro bots

vote

1 answer

Is it necessary to end episodes when collision occurs in reinforcement learning

I have implemented q learning algorithm in which the agent tries to travel as far as possible. I am using instantaneous rewards and final episode reward as well. When agent collides, i am giving high collision reward in negative and I am not…

reinforcement-learning q-learning

asked Apr 19 '19 at 16:58

elemecro bots

vote

1 answer

Network trains well on a grid of shape N but when evaluating on any variation fails

For training I randomly generate a grid of shape N contaning values 0 and 1. There are two actions defined [0,1] and I want to teach a policy using DQN to take action of 0 when the next number is 1 and take action 1 when next number in the array is…

python tensorflow keras reinforcement-learning q-learning

asked Apr 18 '19 at 22:12

Just_Curious

vote

1 answer

Algorithm for subdivision of 3D surfaces

Background I have a 3D scene, and I want to discretize the space of it so that every coordinate (x, y, z) belongs to a specific cell. Coordinates close to each other belongs to same cells. When I input a coordinate that lies on the surface of one…

c++ geometry voronoi q-learning

asked Apr 16 '19 at 12:40

maurock

Prev 1 2 3

…

29 30 Next