Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

1 answer

Questions About Deep Q-Learning

I read several materials about deep q-learning and I'm not sure if I understand it completely. From what I learned, it seems that Deep Q-learning calculates faster the Q-values rather than putting them on a table by using NN to perform a regression,…

reinforcement-learning q-learning keras-rl

asked Jun 26 '19 at 16:13

mad

2,677
8
35
78

votes

2 answers

tf.losses.mean_squared_error with negative target

I'm using Q learning and i want to know if i can use the tf.losses.mean_squared_error loss calculation function if i have a reward function which can give negative rewards. Because if i have for exemple as output of my network the following Q values…

tensorflow neural-network reinforcement-learning loss-function q-learning

asked May 23 '19 at 08:10

Xeyes

votes

1 answer

DQN - How to feed the input of 4 still frames from a game as one single state input

I was reading this blog about Deep Q-Learning. 1- In the The input section of the blog, I wanted to know how do we feed the 4 still-frames/screenshots from the game, that represent the input state, into the Policy network? Will all 4 frames be fed…

deep-learning reinforcement-learning q-learning

asked May 01 '19 at 19:16

Hazzaldo

votes

1 answer

how to assign states in a DQN (Deep Q-Network)?

I am making a flight simulation with autopilot so i need to make a DQN (Deep Q-Network) to control the autopilot but i don't know the optimal number of states. the simulation is done in unity and all the environment and physics are done too, the…

c# python unity-game-engine neural-network q-learning

asked Apr 22 '19 at 08:41

yousif fayed

votes

0 answers

is this true ? what about Expected SARSA and double Q-Learning?

I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain the difference and tell me when to use each? And…

reinforcement-learning q-learning sarsa temporal-difference

asked Mar 27 '19 at 19:38

Cooper

votes

0 answers

Displaying OpenAI Gym Environment Render In TKinter

I am currently creating a GUI in TKinter in which the user can specify hyperparameters for an agent to learn how to play Taxi-v2 in the openai gym environment, I want to know how I should go about displaying the trained agent playing an episode in…

python tkinter reinforcement-learning openai-gym q-learning

asked Feb 26 '19 at 16:41

RMMD12

votes

1 answer

What is the Full Meaning of the Discount Factor γ (gamma) in Reinforcement Learning?

I'm relatively new to machine learning concepts, and I have been following several lectures/tutorials covering Q-Learning, such as: Stanford's Lecture on Reinforcement Learning They all give short, or vague answers to what exactly gamma's utility is…

machine-learning reinforcement-learning q-learning

asked Jan 23 '19 at 19:25

Adam Whitehurst

votes

1 answer

Inconsistencies between tf.contrib.layer.fully_connected, tf.layers.dense, tf.contrib.slim.fully_connected, tf.keras.layers.Dense

I am trying to implement policy gradient for a contextual bandit problem (https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c). I am defining a model in tensorflow to solve this…

python tensorflow reinforcement-learning q-learning

asked Jan 16 '19 at 16:53

Alex Van de Kleut

votes

0 answers

Q-Learning policy doesn't agree with Value/Policy Iteration

I am playing with pymdptoolbox. It has a built-in problem of forest management. It can generate a transition matrix P and R by specifying a state value for forest function (default value is 3). The implementation of Q-Learning, PolicyIteration and…

python reinforcement-learning q-learning markov

asked Nov 20 '18 at 05:27

Chenyang

votes

1 answer

Convergence of the Q-learning on the inverted pendulum

Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I am using q-learning with a reward function as…

reinforcement-learning q-learning convergence reward

asked Nov 05 '18 at 16:29

Stevy KUIMI

votes

1 answer

Sarsa and Q Learning (reinforcement learning) don't converge optimal policy

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this eight steps, the agent can be in 5 possible…

reinforcement-learning q-learning sarsa

asked Oct 11 '18 at 07:01

T.L

votes

2 answers

How to implement Q-learning to approximate an optimal control?

I am interested in implementing Q-learning (or some form of reinforcement learning) to find an optimal protocol. Currently, I have a function written in Python where I can take in the protocol or "action" and "state" and returns a new state and a…

python reinforcement-learning q-learning openai-gym

asked Sep 09 '18 at 01:31

tooty44

6,829
9
27
39

votes

1 answer

Unable to learn MountainCar using Q-Learning with Function Approximation

I am trying to implement a linear function approximation for solving MountainCar using q-learning. I know this environment can't be perfectly approximated with a linear function due to the spiral-like shape of the optimal policy, but the behaviour I…

python reinforcement-learning q-learning

asked Aug 31 '18 at 00:39

ivallesp

2,018
1
14
21

votes

2 answers

Model for OpenAI gym's Lunar Lander not converging

I am trying to use deep reinforcement learning with keras to train an agent to learn how to play the Lunar Lander OpenAI gym environment. The problem is that my model is not converging. Here is my code: import numpy as np import gym from…

neural-network keras deep-learning reinforcement-learning q-learning

asked Jul 19 '18 at 14:48

Gregory T. Cerchione

votes

1 answer

how to define a state in python for reinforcement learning

I need to create a state space for my RL problem which has about 10 state variables each which contains about 2 or 3 values for the variables. That would make the state space about 600,000 states. How do I implement this in python?

reinforcement-learning q-learning

asked Jul 16 '18 at 03:12

Yomal

Prev 1 2 3

…

29 30 Next