Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

vote

1 answer

Problems with implementing approximate(feature based) q learning

I am new to reinforcement learning. I had recently learned about approximate q learning, or feature-based q learning, in which you describe states by features to save space. I have tried to implement this in a simple grid game. Here, the agent is…

asked Apr 06 '19 at 01:21

Love2Code

vote

0 answers

Python Tensorflow DQN Next Steps

I can't figure out the next steps for my Deep Q Network. I'm trying to optimize bus routes. I have a distance matrix and data on stop popularity. The distance matrix is a 2d array with all of the stops detailing the distance between them. If there…

python tensorflow neural-network reinforcement-learning q-learning

asked Mar 23 '19 at 14:05

Rayna Levy

vote

2 answers

Build a matrix of available actions for Q-Learning

I am simulating an inventory management system for a retail shop; therefore, I have a (15,15) matrix of zeros in which states are rows and actions columns: Q = np.matrix(np.zeros([15, 15]) ) Specifically, 0 is the minimum and 14 the maximum…

numpy reinforcement-learning q-learning

asked Mar 19 '19 at 16:34

Alessandro Ceccarelli

1,775
5
21
41

vote

0 answers

Deep Q learning, LSTM and Q-values convergence

I am implementing a Reinforcement Learning agent that takes action given a time series of prices. The actions are, classically, buy sell or wait. The neural network gets as input one batch at the time, the window size is 96 steps, and I have around…

python keras deep-learning lstm q-learning

asked Mar 04 '19 at 21:32

FS93

vote

1 answer

Q-learning model not improving

Im trying to solve the cartpole problem in openAI's gym. By Q learning. I think I have misunderstood how Q-learning works, since my model is not improving. Im using a dictionary as my Q table. So I "hash" (turning into a string) every observation.…

python reinforcement-learning openai-gym q-learning

asked Feb 15 '19 at 11:51

mrfr

1,724
2
23
44

vote

1 answer

reinforcement learning - drive to waypoint

I'm playing around with making a self driving car in a pc game. I was thinking of using reinforcement learning, and giving the car a location on the map to get to. The reward would be a function of the distance from the waypoint, and something…

keras reinforcement-learning q-learning deepdrive

asked Feb 12 '19 at 16:30

DaveS

vote

0 answers

How to fix "The truth value of an array with more than one element is ambiguous" error when finding objects in dictionary?

I'm trying to implement a simple Reinforcement Learning algorithm. Basically, the agent is supposed to move from point A of a square grid to point B using Q-learning. I've gotten this to work previously using a simpler model, but now I need to…

python-3.x dictionary reinforcement-learning valueerror q-learning

asked Feb 12 '19 at 10:14

Petter

vote

0 answers

Q-learning for optimal order placement

So the last thread I made about Reinforcement Learning was marked as too broad, which I totally understood. I've never worked with it before, so I'm trying to learn it on my own - not an easy task so far. Now, I've been reading some papers and tried…

python reinforcement-learning q-learning

asked Jan 16 '19 at 14:18

Sergio

vote

1 answer

Q-learning, what is the effect of test episodes count on convergence?

in the following code which is the code for solving the FrozenLake 4x4 by Q-learning. in the training part why are we playing 20 episodes of test environment instead of just 1 in each loop? I tried both numbers of iterations: when playing 20…

python artificial-intelligence q-learning

asked Jan 16 '19 at 06:03

Meysam Ghorbani

vote

1 answer

Q-Learning Intermediate Rewards

If a Q-Learning agent actually performs noticeably better against opponents in a specific card game when intermediate rewards are included, would this show a flaw in the algorithm or a flaw in its implementation?

reinforcement-learning q-learning reward-system

asked Dec 04 '18 at 23:10

Uzay Macar

vote

0 answers

Q-learning with experience replay not learning

I am trying to implement experience replay (ER) in the OpenAI taxi-v2 environment. It is supposed to make the convergence faster, but it seems that the agent is not learning when I turn on experience replay. From the literature, ER is supposed to…

python machine-learning reinforcement-learning q-learning

asked Dec 04 '18 at 14:05

BlueKryptonite

vote

0 answers

Issues with Q-learning and neural networks

I'm just starting out learning Q-learning, and I've been okay with using the tabular method to get some decent results. One game I found quite fun to use Q-learning was with Blackjack, which seemed like a perfect MDP type problem. I've been wanting…

reinforcement-learning q-learning

asked Sep 28 '18 at 07:09

Corpsecreate

vote

2 answers

Deep reinforcement learning - how to deal with boundaries in action space

I've built a custom reinforcement learning environment and agent which is similar to a labyrinth game. In labyrinth there're 5 possible actions: up, down, left, right, and stay. While if blocked, e.g. agent can't go up, then how do people design env…

machine-learning reinforcement-learning q-learning

asked Jul 02 '18 at 00:35

Kevin Fang

1,966
2
16
31

vote

0 answers

IndexError: index 2 is out of bounds for axis 0 with size 2 // Python 3 Qlearning

I have this piece of code and I can find out where the mistake is coming from boxes=(2,2,4,2) action=(0,1) num_a=2 Q_table = np.zeros(boxes+(num_a,)) if (pre_a != -1): if (s == -1): bestQ = 0 else: …

python-3.x multidimensional-array q-learning

asked Jun 13 '18 at 17:14

Stevy KUIMI

vote

0 answers

Reinforcement learning with function approximation and eligibility traces

I'm currently thinking of doing TD(λ) for a DQN network. I know how to implement if it's a table (you update Q(s,a) and e(s,a) for all state and action pairs), but what happens when the Q value is now retrieved from a function approximator (neural…

reinforcement-learning q-learning

asked Jun 12 '18 at 08:05

Andy Wei

Prev 1 2 3

…

29 30 Next