Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

1 answer

What exactly is the difference between Q, V (value function) , and reward in Reinforcement Learning?

In the context of Double Q or Deuling Q Networks, I am not sure if I fully understand the difference. Especially with V. What exactly is V(s)? How can a state have an inherent value? If we are considering this in the context of trading stocks…

asked Dec 06 '18 at 22:29

Rashan Arshad

votes

4 answers

Why is my Deep Q Net and Double Deep Q Net unstable?

I am trying to implement DQN and DDQN(both with experience reply) to solve OpenAI AI-Gym Cartpole Environment. Both of the approaches are able to learn and solve this problem sometimes, but not always. My network is simply a feed forward…

python tensorflow reinforcement-learning q-learning

asked Oct 12 '18 at 01:00

Jack

votes

1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…

reinforcement-learning q-learning temporal-difference monte-carlo-tree-search value-iteration

asked May 07 '18 at 18:28

Johan

votes

2 answers

Low GPU utilisation when running Tensorflow

I've been doing Deep Reinforcement Learning using Tensorflow and OpenAI gym. My problem is low GPU utilisation. Googling this issue, I understood that it's wrong to expect much GPU utilisation when training small networks ( eg. for training mnist).…

python tensorflow reinforcement-learning q-learning openai-gym

asked Jan 26 '18 at 14:44

Nilesh PS

votes

1 answer

Large values of weights in neural network

I use Q-learning with neural network as approimator. And after several training iteration, weights acquire values in the range from 0 to 10. Can the weights take such values? Or does this indicate bad network parameters?

neural-network backpropagation q-learning

asked Apr 06 '17 at 10:49

user6813020

votes

1 answer

How to implement Deep Q-learning gradient descent

So I'm trying to implement Deep Q-learning algorithm created by Google DeepMind and I think I have got a pretty good hang of it now. Yet there is still one (pretty important) thing I don't really understand and I hope you could help. Doesn't yj…

java algorithm neural-network deep-learning q-learning

asked Oct 08 '16 at 12:54

Dope

votes

1 answer

Stochastic state transitions in MDP: How does Q-learning estimate that?

I am implementing Q-learning to a grid-world for finding the most optimal policy. One thing that is bugging me is that the state transitions are stochastic. For example, if I am in the state (3,2) and take an action 'north', I would land-up at (3,1)…

machine-learning reinforcement-learning q-learning

asked Aug 31 '16 at 10:35

Prashant Pandey

4,332
3
26
44

votes

4 answers

TD learning vs Q learning

In a perfect information environment, where we are able to know the state after an action, like playing chess, is there any reason to use Q learning not TD (temporal difference) learning? As far as I understand, TD learning will try to learn…

machine-learning reinforcement-learning q-learning temporal-difference

asked Feb 26 '16 at 11:29

Ricky

votes

1 answer

Tic tac toe machine learning - valid moves

I am toying around with Machine learning. Especially Q-Learning where you have a state and actions and give rewards depending on how well the network did. Now for starters I set myself a simple goal: Train a network so it emits valid moves for…

machine-learning deep-learning tic-tac-toe torch q-learning

asked Jan 31 '16 at 22:26

nitrogenycs

votes

1 answer

Deepmind Deep Q Network (DQN) 3D Convolution

I was reading the deepmind nature paper on DQN network. I almost got everything about it except one. I don't know why no one has asked this question before but it seems a little odd to me anyway. My question: Input to DQN is a 84*84*4 image. The…

deep-learning conv-neural-network q-learning

asked Jan 09 '16 at 10:26

donamin

votes

1 answer

In Q-learning with function approximation, is it possible to avoid hand-crafting features?

I have little background knowledge of Machine Learning, so please forgive me if my question seems silly. Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the…

machine-learning deep-learning q-learning function-approximation

asked Dec 09 '14 at 02:23

cozos

votes

1 answer

Parameter Estimation with mle in pyomo

I want to estimate the parameters of an RL model from a behavioral dataset with pyomo. #dummy data dis_data = pd.DataFrame([0,1,0,0,0,1], columns=['reward']) dis_data['Expt']=str(1) dis_data = dis_data.set_index('Expt') def…

python reinforcement-learning pyomo q-learning mle

asked Apr 08 '21 at 15:53

faraa

votes

0 answers

updating a DQN in R using neuralnet

I am trying to implement a simple case of deep Q learning in R, using the neuralnet package. I have an initial network with initial random weights. I use it to generate some experience for my agent and as a result, I get states and targets. Then I…

r neural-network reinforcement-learning q-learning dqn

asked Feb 24 '21 at 15:25

Andreas

votes

1 answer

Relationship between bellman optimal equation and Q-learning

Optimal value of state-action by bellman optimal equation(63 page of sutton 2018) is and Q-learning is I have known that Q-learning is model-free. so It doesn't need a probability of transition for next state. However, p(s'r|s,a) of bellman…

machine-learning artificial-intelligence reinforcement-learning q-learning

asked Feb 01 '20 at 22:26

HSKim

votes

0 answers

How to build a Q-table of states/actions for robocode?

So my problem is with understanding of creation of Q-table for states with more parameters per state like robocode. 99% of all examples online are just too simple and it is hard to imagine it for environment as complicated as this one. From what I…

machine-learning artificial-intelligence reinforcement-learning q-learning robocode

asked Nov 21 '19 at 19:11

user12412304

Prev 1 2 3

…

29 30 Next