Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
0
votes
0 answers

How to take off objects from map when using Q-learning with OpenAI-Gym in Python

I'm trying to learn how to use Q-learning with OpenAI-Gym in Python, and I modified existing gym 'FrozenLake-v0' to make an example, where agent is going through the map of labirynth and picks up apples - there is a reward for every picked apple.…
0
votes
3 answers

what does "IndexError: index 20 is out of bounds for axis 1 with size 20"

I was working on q learning in a maze environment, However, at the initial stage, it was working fine but afterward, I was getting the following max_future_q = np.max(q_table[new_discrete_state]) IndexError: index 20 is out of bounds for axis 1 with…
Sherin shibu
  • 5
  • 1
  • 2
0
votes
1 answer

Are Q-learning agents required to converge towards actual state-action values?

It is my understanding that Q-learning attempts to find the actual state-action values for all states and actions. However, my hypothetical example below seems to indicate that this is not necessarily the case. Imagine a Markov decision process…
DarkZero
  • 51
  • 5
0
votes
1 answer

IronPython not returning dictionary keys as expected

I am trying to create a q-table as a dictionary filled with random values in grasshopper (a parametric design tool that uses IronPython as interpreter). When I enter the code as shown in image1, I receive a dictionary as shown in image 2. Keys are…
0
votes
1 answer

Why is the pacman game pausing automatically for few seconds and then again running?

I was trying the Pacman game with Q-Learning(Reinforcement learning) in java. However, I could see the game was pausing automatically for a few seconds and then again running. I just wanted to know the reason for this. Youtube Video…
iamarkaj
  • 1
  • 1
0
votes
1 answer

Bellman equation

In Bellman equation where, s = a particular state (room) a = action (moving between the rooms) s′ = state to which the robot goes from s = discount factor R(s, a) = a reward function which takes a state s and action a and outputs a reward…
TinyCoder
  • 33
  • 10
0
votes
1 answer

ValueError: cannot reshape array of size 1 into shape (1,4)

Commenting out the offending code also gives me this error: AssertionError: Cannot call env.step() before calling reset() Trying to follow along a tutorial on openai gym. Getting a numpy error when reshaping the state of my environment. Both of…
0
votes
0 answers

reinforcement learning - how to use a q learning algorithm for a reinforce.jl environment?

I've created this MDP environment using reinforce.jl. It's supposed to mimic the cake eating problem, or consumption-savings problem. I wanna use a q learning algorithm to find the optimal policy. However, reinforce.jl package only has sarsa policy…
0
votes
0 answers

QLearning network in a custom environment is choosing the same action every time, despite the heavy negative reward

So I plugged QLearningDiscreteDense into a dots and boxes game I made. I created a custom MDP environment for it. The problem is that it chooses action 0 each time, the first time it works but then it's not an available action anymore so it's an…
0
votes
1 answer

Multiagent (not deep) reinforcement learning? Modeling the problem

I have N number of agents/users accessing a single wireless channel and at each time, only one agent can access the channel and receive a reward. Each user has a buffer that can store B number of packets and I assume it as infinite buffer. Each user…
0
votes
1 answer

Custom loss function for Deep Q-Learning

The following problem has occurred while tackling a reinforcement learning problem. In my code I eventually get to the following problem, when calculating the loss: My neural network outputs 4 q-values (given a state as input, it outputs the q-value…
Peter
  • 183
  • 1
  • 1
  • 9
0
votes
1 answer

incompatible array types are mixed in the forward input (LinearFunction) in machine learning

I have trained a deep Q-Learning model using Chanier: class Q_Network (chainer.Chain): def __init__(self, input_size, hidden_size, output_size): super (Q_Network, self).__init__ ( fc1=L.Linear (input_size,…
William
  • 3,724
  • 9
  • 43
  • 76
0
votes
1 answer

Deep Q-Learning for grid world

Has anyone implemented the Deep Q-learning to solve a grid world problem where state is the [x, y] coordinates of the player and goal is to reach a certain coordinate [A, B]. Reward setting could be -1 for each step and +10 for reaching [A,B]. [A,…
corvo
  • 676
  • 2
  • 7
  • 20
0
votes
1 answer

What decides epsilon decay value in reinforcement learning?

I've been learning Q learning from the youtube lecture below https://www.youtube.com/watch?v=Gq1Azv_B4-4&list=PLlMOxjd7OfgNxJSgF8pAs3_qMion-X1QI&index=2 In this tutorial, the guy uses epsilon methodology like this(I cut the details out) import…
Baaam Park
  • 415
  • 1
  • 5
  • 7
0
votes
1 answer

How to set coordinates as a state space (range) for use in Q-table?

Suppose I have a class Player that i want to use as my agent.I want all the coordinates possible in my environment to be my state space In my environment, I want to use the coordinates of the player as my state.How should I go about setting my…