Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
2
votes
1 answer

Questions About Deep Q-Learning

I read several materials about deep q-learning and I'm not sure if I understand it completely. From what I learned, it seems that Deep Q-learning calculates faster the Q-values rather than putting them on a table by using NN to perform a regression,…
mad
  • 2,677
  • 8
  • 35
  • 78
2
votes
2 answers

tf.losses.mean_squared_error with negative target

I'm using Q learning and i want to know if i can use the tf.losses.mean_squared_error loss calculation function if i have a reward function which can give negative rewards. Because if i have for exemple as output of my network the following Q values…
2
votes
1 answer

DQN - How to feed the input of 4 still frames from a game as one single state input

I was reading this blog about Deep Q-Learning. 1- In the The input section of the blog, I wanted to know how do we feed the 4 still-frames/screenshots from the game, that represent the input state, into the Policy network? Will all 4 frames be fed…
Hazzaldo
  • 515
  • 1
  • 8
  • 24
2
votes
1 answer

how to assign states in a DQN (Deep Q-Network)?

I am making a flight simulation with autopilot so i need to make a DQN (Deep Q-Network) to control the autopilot but i don't know the optimal number of states. the simulation is done in unity and all the environment and physics are done too, the…
yousif fayed
  • 331
  • 1
  • 4
  • 20
2
votes
0 answers

is this true ? what about Expected SARSA and double Q-Learning?

I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain the difference and tell me when to use each? And…
2
votes
0 answers

Displaying OpenAI Gym Environment Render In TKinter

I am currently creating a GUI in TKinter in which the user can specify hyperparameters for an agent to learn how to play Taxi-v2 in the openai gym environment, I want to know how I should go about displaying the trained agent playing an episode in…
2
votes
1 answer

What is the Full Meaning of the Discount Factor γ (gamma) in Reinforcement Learning?

I'm relatively new to machine learning concepts, and I have been following several lectures/tutorials covering Q-Learning, such as: Stanford's Lecture on Reinforcement Learning They all give short, or vague answers to what exactly gamma's utility is…
2
votes
1 answer

Inconsistencies between tf.contrib.layer.fully_connected, tf.layers.dense, tf.contrib.slim.fully_connected, tf.keras.layers.Dense

I am trying to implement policy gradient for a contextual bandit problem (https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c). I am defining a model in tensorflow to solve this…
2
votes
0 answers

Q-Learning policy doesn't agree with Value/Policy Iteration

I am playing with pymdptoolbox. It has a built-in problem of forest management. It can generate a transition matrix P and R by specifying a state value for forest function (default value is 3). The implementation of Q-Learning, PolicyIteration and…
Chenyang
  • 161
  • 1
  • 11
2
votes
1 answer

Convergence of the Q-learning on the inverted pendulum

Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I am using q-learning with a reward function as…
2
votes
1 answer

Sarsa and Q Learning (reinforcement learning) don't converge optimal policy

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this eight steps, the agent can be in 5 possible…
T.L
  • 21
  • 4
2
votes
2 answers

How to implement Q-learning to approximate an optimal control?

I am interested in implementing Q-learning (or some form of reinforcement learning) to find an optimal protocol. Currently, I have a function written in Python where I can take in the protocol or "action" and "state" and returns a new state and a…
tooty44
  • 6,829
  • 9
  • 27
  • 39
2
votes
1 answer

Unable to learn MountainCar using Q-Learning with Function Approximation

I am trying to implement a linear function approximation for solving MountainCar using q-learning. I know this environment can't be perfectly approximated with a linear function due to the spiral-like shape of the optimal policy, but the behaviour I…
ivallesp
  • 2,018
  • 1
  • 14
  • 21
2
votes
2 answers

Model for OpenAI gym's Lunar Lander not converging

I am trying to use deep reinforcement learning with keras to train an agent to learn how to play the Lunar Lander OpenAI gym environment. The problem is that my model is not converging. Here is my code: import numpy as np import gym from…
2
votes
1 answer

how to define a state in python for reinforcement learning

I need to create a state space for my RL problem which has about 10 state variables each which contains about 2 or 3 values for the variables. That would make the state space about 600,000 states. How do I implement this in python?
Yomal
  • 362
  • 2
  • 15