Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

vote

2 answers

Q-learning, how about picking the action that actually gives most reward?

So in Q learning, you update the Q function by Qnew(s,a) = Q(s,a) + alpha(r + gamma*MaxQ(s',a) - Q(s,a). Now, if I were to use the same principle but change Q to V function, instead of performing the action based on the current V function, you…

reinforcement-learning q-learning

asked Jun 08 '18 at 05:43

Andy Wei

vote

1 answer

How the invariant reward helps training?

I am new to Machine Learning, and I am trying to solve MountainCar-v0 using Q-learning. I can solve the problem now, but I am still confused. According to the MountainCar-v0's Wiki, the reward remains -1 for every step, even if the car has reached…

machine-learning neural-network artificial-intelligence reinforcement-learning q-learning

asked Apr 30 '18 at 09:57

Jiahao Cai

1,222
1
11
25

vote

1 answer

Number of Q values for a deep reinforcement learning network

I am currently developing a deep reinforcement learning network however, I have a small doubt about the number of q-values I will have at the output of the NN. I will have a total of 150 q-values, which personally seems excessive to me. I have read…

neural-network deep-learning reinforcement-learning q-learning

asked Apr 23 '18 at 16:30

Michele

vote

2 answers

Python: updating a 2D array of dictionaries

I'm working on a q-learning project that involves a circle solving a maze, and these is a problem with how I update the Q values but I'm not sure where: I have legit spent 3 days on the subject now and I am at my wits end. Upon closer inspection it…

python q-learning

asked Feb 23 '18 at 23:00

Jessica Chambers

1,246
5
28
56

vote

1 answer

Confusion with Q learning Episode Definition

After reading some tutorials I am still unsure about the definition of any episode. Is episode defined as one walk through from the start state to an exit/goal state?

artificial-intelligence reinforcement-learning q-learning

asked Feb 22 '18 at 21:24

awesome_penguins

vote

0 answers

Q-Learning neural network implementation

I was trying to implement Q-Learning with neural networks. I've got q-learning with a q-table working perfectly fine. I am playing a little "catch the cheese" game. It looks something like this: # # # # # # # # # . . . . . . # # . $ . . . . # # . .…

neural-network reinforcement-learning q-learning

asked Feb 12 '18 at 17:43

Finn Eggers

vote

1 answer

Q-Learning: Inaccurate predictions

I recently started getting into Q-Learning. I'm currently writing an agent that should play a simple board game (Othello). I'm playing against a random opponent. However it is not working properly. Either my agent stays around 50% winrate or gets…

machine-learning neural-network q-learning

asked Jan 03 '18 at 11:14

Exzone

vote

1 answer

MDP & Reinforcement Learning - Convergence Comparison of VI, PI and QLearning Algorithms

I have implemented VI (Value Iteration), PI (Policy Iteration), and QLearning algorithms using python. After comparing results, I noticed something. VI and PI algorithms converge to same utilities and policies. With same parameters, QLearning…

python machine-learning reinforcement-learning q-learning mdp

asked Dec 28 '17 at 17:36

yoe1323456

vote

0 answers

Approximate Q learning in pacman java

I have been working on Pacman AI using Approximate Q learning. I don't have a background in machine learning. At the moment, I don't have ghosts in the maze. The maze is huge, 31 * 36. The feature I currently have is the distance to a dot as in page…

java artificial-intelligence q-learning pacman

asked Dec 22 '17 at 03:13

Levi

vote

1 answer

How to add constraint to reinforcement learning (Q-learning)

I want know how to add a constraint to Q-learning. I have an action resulting in two rewards every time (reward 1= delivery cost , reward 2= delivery time). I want to minimize the cost while ensuring max delivery time limit is not violated. Is…

machine-learning constraints reinforcement-learning q-learning

asked Nov 28 '17 at 17:28

Jerry

vote

0 answers

TypeError: Cannot interpret feed_dict key as Tensor: The name 'save/Const:0' refers to a Tensor which does not exist

From this file: https://github.com/llSourcell/pong_neural_network_live/blob/master/RL.py I've updated the lines #first convolutional layer. bias vector #creates an empty tensor with all elements set to zero with a shape W_conv1 =…

python tensorflow reinforcement-learning q-learning

asked Nov 02 '17 at 22:51

Dr. Div

vote

1 answer

Randomize Optimal Action Choice

I'm working on the code below for a self driving car program. I have an issue in my choose_action function. The agent should be choosing a random action from a choice of actions that have the highest Q-value in the step below: "else: …

python-2.7 q-learning

asked Sep 27 '17 at 06:02

modLmakur

vote

0 answers

Simple Q Learning Example in Python 3

I am working on a simple q learning code in python. After running several iterations the program suggest a valid path, but not always the shortest -which is the point of the program. I am not sure what I am overlooking. I am using a jupyter…

python-3.x q-learning

asked Sep 02 '17 at 17:37

Kris

vote

1 answer

Capturing state as array in QLearning with Accord.net

I'm trying to implement QLearning to simulated ants in Unity. Following Accord's Animat Example, I managed to implement the gist of the algorithm. Now my Agent has 5 state inputs - Three of them comes from sensors that detect obstacles in front of…

reinforcement-learning accord.net q-learning

asked Aug 31 '17 at 14:37

Hookkid

vote

2 answers

prioritized experience replay in deep Q-learning

i was implementing DQN in mountain car problem of openai gym. this problem is special as the positive reward is very sparse. so i thought of implementing prioritized experience replay as proposed in this paper by google deep mind. there are certain…

deep-learning priority-queue reinforcement-learning q-learning

asked Jul 18 '17 at 08:10

Sankalp Garg

Prev 1 2 3

…

29 30 Next