Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
0
votes
1 answer

FrozenLake Q-Learning Update Issue

I'm learning Q-Learning and trying to build a Q-learner on the FrozenLake-v0 problem in OpenAI Gym. Since the problem has only 16 states and 4 possible actions it should be fairly easy, but looks like my algorithm is not updating the Q-table…
snowneji
  • 1,086
  • 1
  • 11
  • 25
0
votes
1 answer

Why random sample from replay for DQN?

I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…
ZAR
  • 2,550
  • 4
  • 36
  • 66
0
votes
1 answer

Feeding a tensorflow placeholder from an array

I'm trying to train CatPole-v0 using Q learning. When trying to update the replay buffer with experience I am getting the following error: ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_1:0', which has shape '(?, 2)' The…
Dee
  • 153
  • 3
  • 15
0
votes
1 answer

How does Deep Q learning work

When I am training my model I have the following segment: s_t_batch, a_batch, y_batch = train_data(minibatch, model2) # perform gradient step loss.append(model.train_on_batch([s_t_batch, a_batch], y_batch)) where s_t, a_ corresponds to current…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
0
votes
0 answers

Fastest way to compare large number of vector of vectors that contains int values

I am building a single player board game as a hobby and a Q - learner for it. I will create a table for rewards(state, action) as the philosophy of q learning. I will take each board state after a key press as a 'state' and board is…
NONONONONO
  • 612
  • 1
  • 6
  • 10
0
votes
1 answer

How do I index from another array into a tensor tensorflow

I am trying to write a deep q-learning network for a problem in AI. I have a function predict() that produces a tensor of shape (None, 3) taking in an input of shape (None, 5). The 3 in (None, 3) corresponds to the q-value of each action that can be…
Ananda
  • 2,925
  • 5
  • 22
  • 45
0
votes
1 answer

Q-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columns

I have set up a Q-learning problem in R, and would like some help with the theoretical correctness of my approach in framing the problem. Problem structure For this problem, the environment consists of 10 possible states. When in each state, the…
0
votes
0 answers

Q-learning with clustered time series

I am new to Q-learning, and I recently tried to apply this algorithm to a problem with 9 states and 2 possible actions. I am considering a big number of time series, each of which has only 10 data points, and want to choose between two actions at…
som
  • 11
  • 3
0
votes
0 answers

Q-Learning Neural Network in Lasagne

I'm just beginning to experiment with neural networks and was hoping to create a neural network capable of learning to play the game Gomoku via q-learning. After reading through some of the Lasagne tutorials and API, I unsure how to proceed with my…
0
votes
1 answer

Automatic differentiation in policy gradient networks

I do understand the backpropagation in policy gradient networks, but am not sure how works with libraries that auto-differentiate. That is, how they transform it into a supervised learning problem. For example, the code below: Y = self.probs +…
0
votes
1 answer

Reinforcement Learning in Dynamic Environment with large state -action space

I have a 500*500 grid with 7 different penalty values. I need to make an RL agent whose action space contains 11 actions. (Left, Right, Up, Down, 4 Diagonal Directions, Speed Up, Speed Down And Normal Speed). How can I solve this problem? The…
0
votes
0 answers

How does it work Q-learning+NN

I'm trying to figure out the code from the second part of this article (Q-learning + NN) https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0 1) Why do we…
0
votes
1 answer

DQN(Reinforcement learning) : should state be standardized?

This is my state dataframe: >> state_df.head() A B C 0 -1.469587 -1.186974 -1.136587 1 -1.310300 -1.032667 -1.389515 2 -0.041564 -0.112118 -0.742551 3 0.698519 0.453808 -0.194451 4 …
user3595632
  • 5,380
  • 10
  • 55
  • 111
0
votes
2 answers

Reinforce Learning: Do I have to ignore hyper parameter(?) after training done in Q-learning?

Learner might be in training stage, where it update Q-table for bunch of epoch. In this stage, Q-table would be updated with gamma(discount rate), learning rate(alpha), and action would be chosen by random action rate. After some epoch, when reward…
user3595632
  • 5,380
  • 10
  • 55
  • 111
0
votes
0 answers

Experience replay in Q-learning explodes

I implemented Q-learning in the game of Tron and this is working perfectly. However, I want to implement experience replay as to see how this works. This is wat I do: I store the current state, the action taken, the reward, and the highest Q-value…
Stefan1993
  • 193
  • 2
  • 2
  • 15