Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
1
vote
0 answers

Is the reward related to previous state or next state?

In the reinforcement learning framework, I am a little bit confused about the reward and how it is related to states. For example, in Q-learning, we have the following formula for updating the Q table: that means that the reward is obtained from…
MadMage
  • 186
  • 1
  • 7
1
vote
1 answer

OpenAI Gym - Maze - Using Q learning- "ValueError: dir cannot be 0. The only valid dirs are dict_keys(['N', 'E', 'S', 'W'])."

I'm trying to train an agent using Q learning to solve the maze. I created the environment using: import gym import gym_maze import numpy as np env = gym.make("maze-v0") Since the states are in [x,y] coordinates and I wanted to have a 2D Q…
Penguin
  • 1,923
  • 3
  • 21
  • 51
1
vote
0 answers

Correct approach to improve/retrain an offiline model

I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an…
1
vote
1 answer

How to train an RL agent in case of actions space, that consists of n binary actions?

I need to train an RL agent, that have to control some switches. Lets imagine, that we have n switches, that could be turned on (1) or turned off (0). my agent have to decide in each step which one of the to torn of, and turn of, so i want action…
1
vote
0 answers

Q-learning based Shortest Path algorithm

I'm trying to implement a Q-learning based shortest path algorithm. However, sometimes I'm not getting the same path as the classic shortest path algorithm based on the same origin and destination. Here is how I've modeled the…
1
vote
0 answers

AssertionError: defaultdict(. at 0x7f31699ffe18>

I have been working on a DQN using stable baselines and a discrete environment with 3 actions. I am using the RL tutorial…
1
vote
1 answer

How to select an action from a matrix in Q learning when using multiple frames as input

When using deep q-learning I am trying to capture motion by passing a number of grayscale frames as the input, each with the dimensions 90x90. There will be four 90x90 frames passed in to allow the network to detect motion. The multiple frames…
1
vote
2 answers

Using matplotlib to plot mean learning curve of agents playing tictactoe

I have written a Q-learning agent that plays tic-tac-tie against a random player. I want to play the game 20 times and plot a single mean learning curve using matplotlib. The first for loop plays the game twenty times and produces a list of…
Rob
  • 73
  • 7
1
vote
0 answers

Q values overshoot in Double Deep Q Learning

I am trying to teach the agent to play ATARI Space Invaders video game, but my Q values overshoot. I have clipped positive rewards to 1 (agent also receives -1 for losing a life), so the maximum expected return should be around 36 (maybe I am wrong…
1
vote
0 answers

Reinforcement learning with hard constraints

The environment is a directed graph that consists of nodes which have their own "goodness"(marked green) and edges that have prices(marked red). In this environment exists Price(P) constraint. The goal is to accumulate the most "goodness" points…
1
vote
0 answers

Does Q-Learning apply here?

Let's say that we have an algorithm that given a dataset point, it runs some analysis on it and returns the results. The algorithm has a user-defined parameter X that affects the run-time of the algorithm (result of the algorithm is always constant…
Omid
  • 23
  • 1
  • 2
1
vote
1 answer

OpenAI Gym LunarLander execution considerably slowed down for an unknown reason

I have been playing around with the OpenAI Gym LunarLander testing a DQN neural network. I had gotten to a point where it was slowly learning. Since I had started with the CartPole problem which was solved in a couple of minutes/episodes, I…
Max Michel
  • 575
  • 5
  • 20
1
vote
1 answer

Agent repeats the same action circle non stop, Q learning

How can you prevent the agent from non-stop repeating the same action circle? Of course, somehow with changes in the reward system. But are there general rules you could follow or try to include in your code to prevent such a problem? To be more…
1
vote
0 answers

How does Monte Carlo Exploring Starts work?

Flow Chart I'm having trouble understanding the 4th and 5th step in the flowchart. Am I right to say that the Q value of a particular state and action is the same as the state-action pair value of that same state and action? For the 4th step, does…
1
vote
1 answer

How are n dimensional vectors state vectors represented in Q Learning?

Using this code: import gym import numpy as np import time """ SARSA on policy learning python implementation. This is a python implementation of the SARSA algorithm in the Sutton and Barto's book on RL. It's called SARSA because - (state, action,…
blue-sky
  • 51,962
  • 152
  • 427
  • 752