Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
1
vote
2 answers

Q-learning, how about picking the action that actually gives most reward?

So in Q learning, you update the Q function by Qnew(s,a) = Q(s,a) + alpha(r + gamma*MaxQ(s',a) - Q(s,a). Now, if I were to use the same principle but change Q to V function, instead of performing the action based on the current V function, you…
Andy Wei
  • 618
  • 7
  • 22
1
vote
1 answer

How the invariant reward helps training?

I am new to Machine Learning, and I am trying to solve MountainCar-v0 using Q-learning. I can solve the problem now, but I am still confused. According to the MountainCar-v0's Wiki, the reward remains -1 for every step, even if the car has reached…
1
vote
1 answer

Number of Q values for a deep reinforcement learning network

I am currently developing a deep reinforcement learning network however, I have a small doubt about the number of q-values I will have at the output of the NN. I will have a total of 150 q-values, which personally seems excessive to me. I have read…
1
vote
2 answers

Python: updating a 2D array of dictionaries

I'm working on a q-learning project that involves a circle solving a maze, and these is a problem with how I update the Q values but I'm not sure where: I have legit spent 3 days on the subject now and I am at my wits end. Upon closer inspection it…
Jessica Chambers
  • 1,246
  • 5
  • 28
  • 56
1
vote
1 answer

Confusion with Q learning Episode Definition

After reading some tutorials I am still unsure about the definition of any episode. Is episode defined as one walk through from the start state to an exit/goal state?
1
vote
0 answers

Q-Learning neural network implementation

I was trying to implement Q-Learning with neural networks. I've got q-learning with a q-table working perfectly fine. I am playing a little "catch the cheese" game. It looks something like this: # # # # # # # # # . . . . . . # # . $ . . . . # # . .…
Finn Eggers
  • 857
  • 8
  • 21
1
vote
1 answer

Q-Learning: Inaccurate predictions

I recently started getting into Q-Learning. I'm currently writing an agent that should play a simple board game (Othello). I'm playing against a random opponent. However it is not working properly. Either my agent stays around 50% winrate or gets…
Exzone
  • 53
  • 4
1
vote
1 answer

MDP & Reinforcement Learning - Convergence Comparison of VI, PI and QLearning Algorithms

I have implemented VI (Value Iteration), PI (Policy Iteration), and QLearning algorithms using python. After comparing results, I noticed something. VI and PI algorithms converge to same utilities and policies. With same parameters, QLearning…
1
vote
0 answers

Approximate Q learning in pacman java

I have been working on Pacman AI using Approximate Q learning. I don't have a background in machine learning. At the moment, I don't have ghosts in the maze. The maze is huge, 31 * 36. The feature I currently have is the distance to a dot as in page…
Levi
  • 321
  • 2
  • 12
1
vote
1 answer

How to add constraint to reinforcement learning (Q-learning)

I want know how to add a constraint to Q-learning. I have an action resulting in two rewards every time (reward 1= delivery cost , reward 2= delivery time). I want to minimize the cost while ensuring max delivery time limit is not violated. Is…
1
vote
0 answers

TypeError: Cannot interpret feed_dict key as Tensor: The name 'save/Const:0' refers to a Tensor which does not exist

From this file: https://github.com/llSourcell/pong_neural_network_live/blob/master/RL.py I've updated the lines #first convolutional layer. bias vector #creates an empty tensor with all elements set to zero with a shape W_conv1 =…
Dr. Div
  • 951
  • 14
  • 26
1
vote
1 answer

Randomize Optimal Action Choice

I'm working on the code below for a self driving car program. I have an issue in my choose_action function. The agent should be choosing a random action from a choice of actions that have the highest Q-value in the step below: "else: …
modLmakur
  • 531
  • 2
  • 8
  • 24
1
vote
0 answers

Simple Q Learning Example in Python 3

I am working on a simple q learning code in python. After running several iterations the program suggest a valid path, but not always the shortest -which is the point of the program. I am not sure what I am overlooking. I am using a jupyter…
Kris
  • 11
  • 2
1
vote
1 answer

Capturing state as array in QLearning with Accord.net

I'm trying to implement QLearning to simulated ants in Unity. Following Accord's Animat Example, I managed to implement the gist of the algorithm. Now my Agent has 5 state inputs - Three of them comes from sensors that detect obstacles in front of…
Hookkid
  • 178
  • 3
  • 12
1
vote
2 answers

prioritized experience replay in deep Q-learning

i was implementing DQN in mountain car problem of openai gym. this problem is special as the positive reward is very sparse. so i thought of implementing prioritized experience replay as proposed in this paper by google deep mind. there are certain…