Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

vote

1 answer

Q-values get too high, values become NaN, Q-Learning Tensorflow

I programmed a very easy game which works the following way: Given an 4x4 field of squares, a player can move (up, right, down or left). Going on a square the agent never visited before gives the reward 1. Stepping on "dead-field" is rewarded with…

asked Apr 09 '20 at 09:41

mathematics-and-caffeine

1,664
2
15
19

vote

1 answer

What does square brackets on their own represent?

Hi I'm just working with adapting some python code and making sure I understand everything within it as I've never really worked with Python before. What does the [0] alone mean in the code mean? (qtable is 2 dimensional array, holding states(s) and…

python arrays q-learning

asked Mar 31 '20 at 10:43

TheButterWorm

vote

1 answer

Create a specific tensor from another tensor

q_pred = self.Q.forward(states) gives me the following output : tensor([[-4.4713e-02, 4.2878e-03], [-2.2801e-01, 2.2295e-01], [-9.8098e-03, -1.0766e-01], [-1.4654e-01, 1.2742e-01], [-1.6224e-01, 1.6565e-01], …

python pytorch q-learning

asked Mar 28 '20 at 23:48

jgauth

vote

0 answers

Deep Value-only Reinforcement Learning: Train V(s) instead of Q(s,a)?

Is there a value-based (Deep) Reinforcement Learning RL algorithm available that is centred fully around learning only the state-value function V(s), rather than to the state-action-value function Q(s,a)? If not, why not, or, could it easily be…

deep-learning dynamic-programming reinforcement-learning q-learning dqn

asked Mar 24 '20 at 10:41

FlorianH

vote

0 answers

Deep Q Learning: question about back propagation

I'm trying to create a reinforcement-learning neural network for the CartPole v0 problem from OpenAI Gym. I understand that to find the error of the neural network I must calculate the target Q-value from the Bellman equation and subtract that from…

python machine-learning deep-learning openai-gym q-learning

asked Mar 19 '20 at 18:55

Noah

vote

1 answer

Maximum Q-values in practical scenario?

Q-learning is a very simple to thing to implement and can be easily applied to explore and solve various environments or games. But as the complexity of the states increase and no. of possible actions increase, the practicality of Q-learning…

tensorflow reinforcement-learning q-learning

asked Mar 06 '20 at 14:44

neel g

1,138
1
11
25

vote

2 answers

Chrome T-Rex-Game Reinforcement learning showing no improvement

I would like to create an AI for the Chrome-No-Internet-Dino-Game. Therefore I adapted this Github-Repository to fit my needs. I used the following formula to calculate the new Q: Source: https://en.wikipedia.org/wiki/Q-learning My problem now is…

python machine-learning deep-learning reinforcement-learning q-learning

asked Mar 01 '20 at 16:26

user9910916

vote

0 answers

Q-learning algorithm rewards generation

I am studying the Q-learning algorithm ( this is the tutorial I am following: https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/ ). Basically, we have some set of states ( and some walls between them ) and we need to be…

python reinforcement-learning q-learning

asked Feb 17 '20 at 23:14

Petur Ulev

vote

0 answers

What is the best way to deal with imbalanced sample database with rewards

I look for a solution to train a DNNClassifier (4 classes, 20 numeric features) from imbalanced rewarded samples datafile. Each class represents a game action and reward the action score. Features are given observations. So it looks as QLearning…

tensorflow weighted q-learning reward

asked Jan 23 '20 at 16:34

GerardL

vote

2 answers

Deep Q Network gives same Q values and doesn't improve

I'm trying to build a deep Q network to play snake. I've run into an issue where the agent doesn't learn and its performance at the end of the training cycle is to repeatedly kill itself. After a bit of debugging, I figured out that the Q values the…

python keras deep-learning reinforcement-learning q-learning

asked Dec 30 '19 at 04:44

achandra03

vote

2 answers

Q-Learning AI Isn't Recognizing Easy Pattern

I have a Q-Learning program trying to predict my stock simulated stock market where the price of the stock goes 1-2-3-1-2-3... I have been trying to debug this for a few days and just can't get it. I even completely started from scratch and the…

python machine-learning artificial-intelligence q-learning

asked Dec 20 '19 at 03:17

user10034548

vote

0 answers

Is it possible to play against a trained bot from the keyboard using gym (Pong)?

I've been working on training a bot(DQN) on Pong using openai-gym and torch. After some success on the PongNoFrameskip-v4 environment, it occurred to me that it would be super nice if I would be able to play against this bot after training(and…

python reinforcement-learning openai-gym pong q-learning

asked Nov 25 '19 at 22:24

John Szatmari

vote

0 answers

Q learning, what are the states, action and reward for game of rummy?

I am working on q learning algorithm for rummy, I have to generate a Q table where it goes as Q[state, action], since in game of rummy, actions are either pick or drop i have the value set to 2 where as when it comes to states, what are the number?…

machine-learning reinforcement-learning q-learning

asked Nov 24 '19 at 16:23

Ajithesh N

vote

0 answers

Unable to create observation space array with more than two dimensions

We are trying to make a Tetris AI on a raspberry pi which is connected to a board with WS2812b LEDs. We defined the action_space and observation_space like this: self.action_space = spaces.Discrete(6) self.observation_space = spaces.Discrete(width…

python artificial-intelligence reinforcement-learning openai-gym q-learning

asked Nov 19 '19 at 10:35

J.Doe

vote

0 answers

Implementing DDPG in tensorflow 2.0

https://stackoverflow.com/a/52340133/11204016 In the above answer, I'm stuck in 7th step. How can we multiply dQ/dA and dA/dTheta? dQ/dA is of dimension of batch size. dA/dTheta is of dimensions of network weights (Thetas of Q netwok).

tensorflow keras reinforcement-learning tensorflow2.0 q-learning

asked Nov 14 '19 at 07:56

Akhil Nelaturi

Prev 1 2 3

…

29 30 Next