Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

vote

0 answers

Is the reward related to previous state or next state?

In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table: that means that the reward is obtained from…

reinforcement-learning q-learning reward

asked Jan 03 '21 at 16:46

MadMage

vote

1 answer

OpenAI Gym - Maze - Using Q learning- "ValueError: dir cannot be 0. The only valid dirs are dict_keys(['N', 'E', 'S', 'W'])."

I'm trying to train an agent using Q learning to solve the maze. I created the environment using: import gym import gym_maze import numpy as np env = gym.make("maze-v0") Since the states are in [x,y] coordinates and I wanted to have a 2D Q…

python machine-learning openai-gym q-learning

asked Dec 17 '20 at 15:51

Penguin

1,923
3
21
51

vote

0 answers

Correct approach to improve/retrain an offiline model

I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an…

offline reinforcement-learning q-learning ray rllib

asked Dec 04 '20 at 14:40

Felipe Leite Antunes

vote

1 answer

How to train an RL agent in case of actions space, that consists of n binary actions?

I need to train an RL agent, that have to control some switches. Lets imagine, that we have n switches, that could be turned on (1) or turned off (0). my agent have to decide in each step which one of the to torn of, and turn of, so i want action…

reinforcement-learning q-learning

asked Oct 29 '20 at 09:21

Yaroslav Svyryda

vote

0 answers

Q-learning based Shortest Path algorithm

I'm trying to implement a Q-learning based shortest path algorithm. However, sometimes I'm not getting the same path as the classic shortest path algorithm based on the same origin and destination. Here is how I've modeled the…

python reinforcement-learning shortest-path q-learning

asked Sep 23 '20 at 21:53

Allan Souza

vote

0 answers

AssertionError: defaultdict(. at 0x7f31699ffe18>

I have been working on a DQN using stable baselines and a discrete environment with 3 actions. I am using the RL tutorial…

python deep-learning reinforcement-learning q-learning stable-baselines

asked Aug 13 '20 at 18:57

NC25

vote

1 answer

How to select an action from a matrix in Q learning when using multiple frames as input

When using deep q-learning I am trying to capture motion by passing a number of grayscale frames as the input, each with the dimensions 90x90. There will be four 90x90 frames passed in to allow the network to detect motion. The multiple frames…

deep-learning pytorch conv-neural-network reinforcement-learning q-learning

asked Jun 21 '20 at 14:16

Ryan McCauley

vote

2 answers

Using matplotlib to plot mean learning curve of agents playing tictactoe

I have written a Q-learning agent that plays tic-tac-tie against a random player. I want to play the game 20 times and plot a single mean learning curve using matplotlib. The first for loop plays the game twenty times and produces a list of…

python matplotlib reinforcement-learning q-learning

asked Jun 17 '20 at 14:36

Rob

vote

0 answers

Q values overshoot in Double Deep Q Learning

I am trying to teach the agent to play ATARI Space Invaders video game, but my Q values overshoot. I have clipped positive rewards to 1 (agent also receives -1 for losing a life), so the maximum expected return should be around 36 (maybe I am wrong…

reinforcement-learning openai-gym q-learning atari-2600

asked Jun 07 '20 at 14:14

Heisenberg666

vote

0 answers

Reinforcement learning with hard constraints

The environment is a directed graph that consists of nodes which have their own "goodness"(marked green) and edges that have prices(marked red). In this environment exists Price(P) constraint. The goal is to accumulate the most "goodness" points…

machine-learning artificial-intelligence reinforcement-learning q-learning monte-carlo-tree-search

asked May 30 '20 at 14:12

Benas.M

vote

0 answers

Does Q-Learning apply here?

Let's say that we have an algorithm that given a dataset point, it runs some analysis on it and returns the results. The algorithm has a user-defined parameter X that affects the run-time of the algorithm (result of the algorithm is always constant…

machine-learning reinforcement-learning q-learning

asked May 04 '20 at 09:45

Omid

vote

1 answer

OpenAI Gym LunarLander execution considerably slowed down for an unknown reason

I have been playing around with the OpenAI Gym LunarLander testing a DQN neural network. I had gotten to a point where it was slowly learning. Since I had started with the CartPole problem which was solved in a couple of minutes/episodes, I…

python keras neural-network q-learning dqn

asked Apr 22 '20 at 16:24

Max Michel

vote

1 answer

Agent repeats the same action circle non stop, Q learning

How can you prevent the agent from non-stop repeating the same action circle? Of course, somehow with changes in the reward system. But are there general rules you could follow or try to include in your code to prevent such a problem? To be more…

python tensorflow reinforcement-learning q-learning

asked Apr 22 '20 at 14:02

mathematics-and-caffeine

1,664
2
15
19

vote

0 answers

How does Monte Carlo Exploring Starts work?

Flow Chart I'm having trouble understanding the 4th and 5th step in the flowchart. Am I right to say that the Q value of a particular state and action is the same as the state-action pair value of that same state and action? For the 4th step, does…

machine-learning reinforcement-learning montecarlo q-learning

asked Apr 15 '20 at 07:26

BG10

vote

1 answer

How are n dimensional vectors state vectors represented in Q Learning?

Using this code: import gym import numpy as np import time """ SARSA on policy learning python implementation. This is a python implementation of the SARSA algorithm in the Sutton and Barto's book on RL. It's called SARSA because - (state, action,…

reinforcement-learning q-learning

asked Apr 14 '20 at 19:47

blue-sky

51,962
152
427
752

Prev 1 2 3

…

29 30 Next