Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
4
votes
1 answer

What exactly is the difference between Q, V (value function) , and reward in Reinforcement Learning?

In the context of Double Q or Deuling Q Networks, I am not sure if I fully understand the difference. Especially with V. What exactly is V(s)? How can a state have an inherent value? If we are considering this in the context of trading stocks…
4
votes
4 answers

Why is my Deep Q Net and Double Deep Q Net unstable?

I am trying to implement DQN and DDQN(both with experience reply) to solve OpenAI AI-Gym Cartpole Environment. Both of the approaches are able to learn and solve this problem sometimes, but not always. My network is simply a feed forward…
Jack
  • 53
  • 1
  • 4
4
votes
1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…
4
votes
2 answers

Low GPU utilisation when running Tensorflow

I've been doing Deep Reinforcement Learning using Tensorflow and OpenAI gym. My problem is low GPU utilisation. Googling this issue, I understood that it's wrong to expect much GPU utilisation when training small networks ( eg. for training mnist).…
4
votes
1 answer

Large values of weights in neural network

I use Q-learning with neural network as approimator. And after several training iteration, weights acquire values in the range from 0 to 10. Can the weights take such values? Or does this indicate bad network parameters?
user6813020
4
votes
1 answer

How to implement Deep Q-learning gradient descent

So I'm trying to implement Deep Q-learning algorithm created by Google DeepMind and I think I have got a pretty good hang of it now. Yet there is still one (pretty important) thing I don't really understand and I hope you could help. Doesn't yj…
Dope
  • 245
  • 1
  • 11
4
votes
1 answer

Stochastic state transitions in MDP: How does Q-learning estimate that?

I am implementing Q-learning to a grid-world for finding the most optimal policy. One thing that is bugging me is that the state transitions are stochastic. For example, if I am in the state (3,2) and take an action 'north', I would land-up at (3,1)…
Prashant Pandey
  • 4,332
  • 3
  • 26
  • 44
4
votes
4 answers

TD learning vs Q learning

In a perfect information environment, where we are able to know the state after an action, like playing chess, is there any reason to use Q learning not TD (temporal difference) learning? As far as I understand, TD learning will try to learn…
4
votes
1 answer

Tic tac toe machine learning - valid moves

I am toying around with Machine learning. Especially Q-Learning where you have a state and actions and give rewards depending on how well the network did. Now for starters I set myself a simple goal: Train a network so it emits valid moves for…
4
votes
1 answer

Deepmind Deep Q Network (DQN) 3D Convolution

I was reading the deepmind nature paper on DQN network. I almost got everything about it except one. I don't know why no one has asked this question before but it seems a little odd to me anyway. My question: Input to DQN is a 84*84*4 image. The…
donamin
  • 43
  • 2
4
votes
1 answer

In Q-learning with function approximation, is it possible to avoid hand-crafting features?

I have little background knowledge of Machine Learning, so please forgive me if my question seems silly. Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the…
3
votes
1 answer

Parameter Estimation with mle in pyomo

I want to estimate the parameters of an RL model from a behavioral dataset with pyomo. #dummy data dis_data = pd.DataFrame([0,1,0,0,0,1], columns=['reward']) dis_data['Expt']=str(1) dis_data = dis_data.set_index('Expt') def…
faraa
  • 575
  • 3
  • 14
  • 42
3
votes
0 answers

updating a DQN in R using neuralnet

I am trying to implement a simple case of deep Q learning in R, using the neuralnet package. I have an initial network with initial random weights. I use it to generate some experience for my agent and as a result, I get states and targets. Then I…
3
votes
1 answer

Relationship between bellman optimal equation and Q-learning

Optimal value of state-action by bellman optimal equation(63 page of sutton 2018) is and Q-learning is I have known that Q-learning is model-free. so It doesn't need a probability of transition for next state. However, p(s'r|s,a) of bellman…
3
votes
0 answers

How to build a Q-table of states/actions for robocode?

So my problem is with understanding of creation of Q-table for states with more parameters per state like robocode. 99% of all examples online are just too simple and it is hard to imagine it for environment as complicated as this one. From what I…