Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
0
votes
0 answers

Use q-learning method to solve knapsack problem

The question is:Sugar 1 gram for 1 dollar,cookie 7 gram for 5 dollars and ice 12 gram for 10 dollars.Now i have 29 dollars,how to buy will be the heaviest? I have found the code on the Internet, but I don’t know how to modify it to solve my…
Lucas
  • 11
  • 1
0
votes
1 answer

Is it okay to remove most oldest experiences of DQN

I have created a DQN with a max memory size of 100000. I have a function that removes the oldest element in the memory if its size is greater than the max size. When I ran it doing 200 episodes, I noticed that the memory was already full at the…
KKK
  • 507
  • 3
  • 12
0
votes
1 answer

Why does the score (accumulated reward) goes down during the exploitation phase in this Deep Q-Learning model?

I'm having a hard time trying to make a Deep Q-Learning agent find the optimal policy. This is how my current model looks like in TensorFlow: model = Sequential() model.add(Dense(units=32, activation="relu",…
0
votes
1 answer

q-agent is really broken, can't decide between a reward of 0 and -1

I was using a dqn for something; it wasn't working. I simplified the problem so that there are 2 actions: 0 and 1. Each action corresponds to a single reward: 0 or -1. Still, my q agent is consistently confused, giving the two actions wild values in…
0
votes
1 answer

q agent is learning not to take any actions

I'm training a deep q network to trade stocks; it has two possible actions; 0 : wait, 1 : buy stock if one isn't bought, sell one if one is bought. It gets, as input, the value of the stock it bought, the current value of the stock and the values of…
RichKat
  • 57
  • 1
  • 8
0
votes
1 answer

Reinforcement learning, Q-learning to determine order to cast spells optimally?

If I have a wizard who has 20 spells, each of which does different things, sometimes direct damage, sometimes disabling, sometimes protecting etc. He has a fight with 10 orcs and I want to determine an optimal order of spell casting to kill the…
0
votes
2 answers

What's wrong with Dyna-Q ? (Dyna-Q vs Q-learning)

I implemented the Q-learning algorithm and used it on FrozenLake-v0 on OpenAI gym. I am getting 185 total rewards during training and 7333 total rewards during testing in 10000 episodes. Is this good ? Also I tried the Dyna-Q algorithm. But it is…
0
votes
0 answers

deep q learning: why use the same net for the target net and predict net can result in instability?

For deep q learning I can kind of imagine the neural net as the q table for normal q learning. So if for the q learning the q table is updated simultaneously, why cannot we use the same net for target q net and predict q net? I searched on google…
J.R.
  • 769
  • 2
  • 10
  • 25
0
votes
0 answers

How can I interpolate missing Reward-matrix entries (Q-learning)?

I have a simple game on a grid. 25 states, five actions per state (left, right, up, down, stay). There might be special rules for edges and corners, but these won't matter here. My reward matrix (below) is pretty sparse, but this is all the data I…
Shay
  • 1,368
  • 11
  • 17
0
votes
0 answers

Deep Q Learning : How to visualize convergence?

I have trained an RL agent in an environment similar to the Puckworld. Theres no puck though! The agent is in continuous space and wants to reach a fixed target. Each episode the agent is born at a random location and there is an added noise to each…
0
votes
1 answer

How do I set up a state space for q-learning?

This is apparently very obvious and basic, because I can't find any tutorials on it, but how do I set up a state space for a q-learning environment? If I understand correctly, every state needs to be associated with a single value, right? If so,…
RichKat
  • 57
  • 1
  • 8
0
votes
1 answer

How many states could I work with on my ordinary home computer when using Q-learning?

How many states could I work with on my ordinary home computer when I want to implement a reinforcement learning algorithm such as Q-Learning? 1 thousand, 1 million, more?
MMM
  • 373
  • 1
  • 4
  • 12
0
votes
1 answer

Does the training lost diagram showing over-fitting? Deep Q-learning

below diagram is the training loss values against epoch. Based on the diagram, does it mean I have make it over-fitting? If not, what is causing the spike in loss values along the epoch? In overall, it can be observed that the loss value is in…
Yeo Keat
  • 143
  • 1
  • 9
0
votes
1 answer

DQN Model ValueError: setting an array element with a sequence

(All references to code can be found at https://github.com/EXJUSTICE/Doom_DQN_GC/blob/master/TF2_Doom_GC_CNN.ipynb) Background I apologize for the length of this post, I wanted it to be as clear as possible. I've been adapting some Atari OpenAI gym…
0
votes
0 answers

Is there any method in reinforcement Learning to select multiple simultaneous actions?

I'm working on a research project that involves the application of reinforcement learning to planning and decision-making problems. Typically, these problems involve picking (sampling) multiple actions within a state based on ranking [max_q to…