Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
1
vote
1 answer

Problems with implementing approximate(feature based) q learning

I am new to reinforcement learning. I had recently learned about approximate q learning, or feature-based q learning, in which you describe states by features to save space. I have tried to implement this in a simple grid game. Here, the agent is…
1
vote
0 answers

Python Tensorflow DQN Next Steps

I can't figure out the next steps for my Deep Q Network. I'm trying to optimize bus routes. I have a distance matrix and data on stop popularity. The distance matrix is a 2d array with all of the stops detailing the distance between them. If there…
1
vote
2 answers

Build a matrix of available actions for Q-Learning

I am simulating an inventory management system for a retail shop; therefore, I have a (15,15) matrix of zeros in which states are rows and actions columns: Q = np.matrix(np.zeros([15, 15]) ) Specifically, 0 is the minimum and 14 the maximum…
Alessandro Ceccarelli
  • 1,775
  • 5
  • 21
  • 41
1
vote
0 answers

Deep Q learning, LSTM and Q-values convergence

I am implementing a Reinforcement Learning agent that takes action given a time series of prices. The actions are, classically, buy sell or wait. The neural network gets as input one batch at the time, the window size is 96 steps, and I have around…
FS93
  • 51
  • 4
1
vote
1 answer

Q-learning model not improving

Im trying to solve the cartpole problem in openAI's gym. By Q learning. I think I have misunderstood how Q-learning works, since my model is not improving. Im using a dictionary as my Q table. So I "hash" (turning into a string) every observation.…
mrfr
  • 1,724
  • 2
  • 23
  • 44
1
vote
1 answer

reinforcement learning - drive to waypoint

I'm playing around with making a self driving car in a pc game. I was thinking of using reinforcement learning, and giving the car a location on the map to get to. The reward would be a function of the distance from the waypoint, and something…
DaveS
  • 105
  • 1
  • 1
  • 8
1
vote
0 answers

How to fix "The truth value of an array with more than one element is ambiguous" error when finding objects in dictionary?

I'm trying to implement a simple Reinforcement Learning algorithm. Basically, the agent is supposed to move from point A of a square grid to point B using Q-learning. I've gotten this to work previously using a simpler model, but now I need to…
1
vote
0 answers

Q-learning for optimal order placement

So the last thread I made about Reinforcement Learning was marked as too broad, which I totally understood. I've never worked with it before, so I'm trying to learn it on my own - not an easy task so far. Now, I've been reading some papers and tried…
Sergio
  • 83
  • 7
1
vote
1 answer

Q-learning, what is the effect of test episodes count on convergence?

in the following code which is the code for solving the FrozenLake 4x4 by Q-learning. in the training part why are we playing 20 episodes of test environment instead of just 1 in each loop? I tried both numbers of iterations: when playing 20…
1
vote
1 answer

Q-Learning Intermediate Rewards

If a Q-Learning agent actually performs noticeably better against opponents in a specific card game when intermediate rewards are included, would this show a flaw in the algorithm or a flaw in its implementation?
Uzay Macar
  • 254
  • 4
  • 13
1
vote
0 answers

Q-learning with experience replay not learning

I am trying to implement experience replay (ER) in the OpenAI taxi-v2 environment. It is supposed to make the convergence faster, but it seems that the agent is not learning when I turn on experience replay. From the literature, ER is supposed to…
1
vote
0 answers

Issues with Q-learning and neural networks

I'm just starting out learning Q-learning, and I've been okay with using the tabular method to get some decent results. One game I found quite fun to use Q-learning was with Blackjack, which seemed like a perfect MDP type problem. I've been wanting…
1
vote
2 answers

Deep reinforcement learning - how to deal with boundaries in action space

I've built a custom reinforcement learning environment and agent which is similar to a labyrinth game. In labyrinth there're 5 possible actions: up, down, left, right, and stay. While if blocked, e.g. agent can't go up, then how do people design env…
Kevin Fang
  • 1,966
  • 2
  • 16
  • 31
1
vote
0 answers

IndexError: index 2 is out of bounds for axis 0 with size 2 // Python 3 Qlearning

I have this piece of code and I can find out where the mistake is coming from boxes=(2,2,4,2) action=(0,1) num_a=2 Q_table = np.zeros(boxes+(num_a,)) if (pre_a != -1): if (s == -1): bestQ = 0 else: …
Stevy KUIMI
  • 47
  • 2
  • 6
1
vote
0 answers

Reinforcement learning with function approximation and eligibility traces

I'm currently thinking of doing TD(λ) for a DQN network. I know how to implement if it's a table (you update Q(s,a) and e(s,a) for all state and action pairs), but what happens when the Q value is now retrieved from a function approximator (neural…
Andy Wei
  • 618
  • 7
  • 22