Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

1 answer

FrozenLake Q-Learning Update Issue

I'm learning Q-Learning and trying to build a Q-learner on the FrozenLake-v0 problem in OpenAI Gym. Since the problem has only 16 states and 4 possible actions it should be fairly easy, but looks like my algorithm is not updating the Q-table…

python reinforcement-learning q-learning

asked Nov 25 '17 at 22:21

snowneji

1,086
1
11
25

votes

1 answer

Why random sample from replay for DQN?

I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…

neural-network deep-learning reinforcement-learning q-learning

asked Nov 19 '17 at 15:23

ZAR

2,550
4
36
66

votes

1 answer

Feeding a tensorflow placeholder from an array

I'm trying to train CatPole-v0 using Q learning. When trying to update the replay buffer with experience I am getting the following error: ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_1:0', which has shape '(?, 2)' The…

tensorflow reinforcement-learning q-learning openai-gym

asked Oct 29 '17 at 05:13

Dee

votes

1 answer

How does Deep Q learning work

When I am training my model I have the following segment: s_t_batch, a_batch, y_batch = train_data(minibatch, model2) # perform gradient step loss.append(model.train_on_batch([s_t_batch, a_batch], y_batch)) where s_t, a_ corresponds to current…

deep-learning reinforcement-learning openai-gym q-learning

asked Oct 17 '17 at 06:25

sachinruk

9,571
12
55
86

votes

0 answers

Fastest way to compare large number of vector of vectors that contains int values

I am building a single player board game as a hobby and a Q - learner for it. I will create a table for rewards(state, action) as the philosophy of q learning. I will take each board state after a key press as a 'state' and board is…

vector c++14 reinforcement-learning tensorflow q-learning

asked Sep 19 '17 at 09:31

NONONONONO

votes

1 answer

How do I index from another array into a tensor tensorflow

I am trying to write a deep q-learning network for a problem in AI. I have a function predict() that produces a tensor of shape (None, 3) taking in an input of shape (None, 5). The 3 in (None, 3) corresponds to the q-value of each action that can be…

python tensorflow deep-learning artificial-intelligence q-learning

asked Aug 17 '17 at 15:48

Ananda

2,925
5
22
45

votes

1 answer

Q-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columns

I have set up a Q-learning problem in R, and would like some help with the theoretical correctness of my approach in framing the problem. Problem structure For this problem, the environment consists of 10 possible states. When in each state, the…

algorithm machine-learning artificial-intelligence reinforcement-learning q-learning

asked Jul 28 '17 at 21:36

user5211911

votes

0 answers

Q-learning with clustered time series

I am new to Q-learning, and I recently tried to apply this algorithm to a problem with 9 states and 2 possible actions. I am considering a big number of time series, each of which has only 10 data points, and want to choose between two actions at…

time-series cluster-analysis q-learning

asked Jul 28 '17 at 17:23

som

votes

0 answers

Q-Learning Neural Network in Lasagne

I'm just beginning to experiment with neural networks and was hoping to create a neural network capable of learning to play the game Gomoku via q-learning. After reading through some of the Lasagne tutorials and API, I unsure how to proceed with my…

python neural-network reinforcement-learning q-learning lasagne

asked Jul 17 '17 at 19:17

Patrick Lin

votes

1 answer

Automatic differentiation in policy gradient networks

I do understand the backpropagation in policy gradient networks, but am not sure how works with libraries that auto-differentiate. That is, how they transform it into a supervised learning problem. For example, the code below: Y = self.probs +…

machine-learning neural-network reinforcement-learning q-learning

asked Jun 25 '17 at 02:50

Abhishek Bhatia

9,404
26
87
142

votes

1 answer

Reinforcement Learning in Dynamic Environment with large state -action space

I have a 500*500 grid with 7 different penalty values. I need to make an RL agent whose action space contains 11 actions. (Left, Right, Up, Down, 4 Diagonal Directions, Speed Up, Speed Down And Normal Speed). How can I solve this problem? The…

artificial-intelligence reinforcement-learning q-learning function-approximation

asked May 08 '17 at 10:31

user7980054

votes

0 answers

How does it work Q-learning+NN

I'm trying to figure out the code from the second part of this article (Q-learning + NN) https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0 1) Why do we…

machine-learning deep-learning reinforcement-learning q-learning

asked May 04 '17 at 14:05

Slava Mulyukin

votes

1 answer

DQN(Reinforcement learning) : should state be standardized?

This is my state dataframe: >> state_df.head() A B C 0 -1.469587 -1.186974 -1.136587 1 -1.310300 -1.032667 -1.389515 2 -0.041564 -0.112118 -0.742551 3 0.698519 0.453808 -0.194451 4 …

python machine-learning reinforcement-learning q-learning

asked Apr 28 '17 at 08:25

user3595632

5,380
10
55
111

votes

2 answers

Reinforce Learning: Do I have to ignore hyper parameter(?) after training done in Q-learning?

Learner might be in training stage, where it update Q-table for bunch of epoch. In this stage, Q-table would be updated with gamma(discount rate), learning rate(alpha), and action would be chosen by random action rate. After some epoch, when reward…

reinforcement-learning q-learning

asked Apr 25 '17 at 13:23

user3595632

5,380
10
55
111

votes

0 answers

Experience replay in Q-learning explodes

I implemented Q-learning in the game of Tron and this is working perfectly. However, I want to implement experience replay as to see how this works. This is wat I do: I store the current state, the action taken, the reward, and the highest Q-value…

neural-network reinforcement-learning q-learning

asked Apr 11 '17 at 15:33

Stefan1993

Prev 1 2 3

…

29 30 Next