Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
2
votes
0 answers

AttributeError: 'Environment1' object has no attribute 'observation_space'

I'm using Keras to build a ddpg model,I followed the official instruction from here enter link description here But I change the gym env to my own env: import tensorflow as tf from tensorflow.keras import layers import matplotlib.pyplot as…
2
votes
2 answers

DQN understanding input and output (layer)

I have a question about the input and output (layer) of a DQN. e.g Two points: P1(x1, y1) and P2(x2, y2) P1 has to walk towards P2 I have the following information: Current position P1 (x/y) Current position P2 (x/y) Distance to P1-P2…
Tailor
  • 193
  • 1
  • 12
2
votes
1 answer

Why the learning rate for Q-learning is important for stochastic environments?

As stated in the Wikipedia https://en.wikipedia.org/wiki/Q-learning#Learning_Rate, for a stochastic problem, using the learning rate is important for convergence. Although I tried to find the "intuition" behind the reason without any mathematical…
2
votes
1 answer

Pytorch DQN, DDQN using .detach() caused very wield loss (increases exponentially) and do not learn at all

Here is my implementation of DQN and DDQN for CartPole-v0 which I think is correct. import numpy as np import torch import torch.nn as nn import torch.nn.functional as F import gym import torch.optim as optim import random import os import…
Yilin L.
  • 75
  • 6
2
votes
1 answer

Writing a good Reward Function for my QLearning Agent

I'm still new to ML, recently I've learned Q-Learning and coded it manually (Not using a library like Keras or TensorFlow), and the problem I'm facing is knowing how to write a good reward function for my agent, I've started by writing the following…
2
votes
1 answer

What is the best way to save Q table to file?

I'm planning to save my q table to a text file (as a string) for future use, but I wondered what the pitfalls of this might be? Also, any advice on what might be a better way to store the q table would be appreciated – would it better to store it as…
2
votes
2 answers

Deep Q Learning **WITHOUT** OpenAI Gym

Does anyone have or know of any tutorials / courses that teach q learning without the use of open ai gym. I'm trying to make a convolutional q learning model and I have no problem doing this with pytorch and open ai gym, easy! but when I try and…
oz.vegas
  • 129
  • 7
2
votes
0 answers

Improving GPU utilization using tensorflow

I am using keras with tensorlow backend to create a Deep Q-learning agent to play atari games on openai gym. But when i train the model my gpu utilization stays around 8 to 10 percent.I am new to this stuff and am unable to figure out how to improve…
Rushik
  • 29
  • 2
2
votes
1 answer

Can you do q-learning with a Tuple observation space?

I build a custom Open AI Gym Environment that uses simple Tuple observation space. self.observation_space = spaces.Tuple((spaces.Discrete(2,),spaces.Discrete(1))) But when I try to use q-learning examples they use observation_space.n Is there a…
Daniel Rojas
  • 407
  • 2
  • 5
  • 16
2
votes
0 answers

How does loss.backward() work for batches?

So I am training a DDQN to play connect four at the moment. At each state, the network predicts the action the best action and moves accordingly. The code looks basically like follows: for epoch in range(num_epochs): for i in…
2
votes
1 answer

Approximate the q-function with NN in the FrozenLake exercise

import numpy as np import gym import random import time from IPython.display import clear_output env = gym.make("FrozenLake-v0") action_space_size = env.action_space.n state_space_size = env.observation_space.n q_table =…
2
votes
0 answers

Q-learning: How to include a terminal state in updating rule?

I use Q-learning in order to determine the optimal path of an agent. I know in advance that my path is composed of exactly 3 states (so after 3 states I reach a terminal state). I would like to know how to include that in the updating rule of the…
Hajar Elhammouti
  • 103
  • 2
  • 10
2
votes
1 answer

Questions about Q-learning in a 2D maze

I just read about Q-learning and I'm not sure if I understand this correctly. All examples I saw are rat-in-a-maze, where the rat must move towards the cheese, and the cheese doesn't move. I'm just wondering if it's possible to do Q-learning in a…
2
votes
0 answers

Simple neural network with Q learning

I am relatively new in reinforcement learning and modeling an algorithm to implement reinforcement-learning in my Game. I studied and have knowledge of neural networks and Q learning. According to my game specs I fitted model-free, off-policy…
2
votes
1 answer

OpenAI gym action_space how to limit choices

Suppose the action space is a game with 5 doors and you can choose 2 and only 2 at each step. How could that be represented as an action_space? self.action_space = spaces.Box( np.array([0,0,0,0,0]), np.array([+1,+1,+1,+1,+1])) # Using the above…
Kevin
  • 3,690
  • 5
  • 35
  • 38