Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
1
vote
1 answer

Q-values get too high, values become NaN, Q-Learning Tensorflow

I programmed a very easy game which works the following way: Given an 4x4 field of squares, a player can move (up, right, down or left). Going on a square the agent never visited before gives the reward 1. Stepping on "dead-field" is rewarded with…
1
vote
1 answer

What does square brackets on their own represent?

Hi I'm just working with adapting some python code and making sure I understand everything within it as I've never really worked with Python before. What does the [0] alone mean in the code mean? (qtable is 2 dimensional array, holding states(s) and…
1
vote
1 answer

Create a specific tensor from another tensor

q_pred = self.Q.forward(states) gives me the following output : tensor([[-4.4713e-02, 4.2878e-03], [-2.2801e-01, 2.2295e-01], [-9.8098e-03, -1.0766e-01], [-1.4654e-01, 1.2742e-01], [-1.6224e-01, 1.6565e-01], …
jgauth
  • 195
  • 1
  • 6
  • 14
1
vote
0 answers

Deep Value-only Reinforcement Learning: Train V(s) instead of Q(s,a)?

Is there a value-based (Deep) Reinforcement Learning RL algorithm available that is centred fully around learning only the state-value function V(s), rather than to the state-action-value function Q(s,a)? If not, why not, or, could it easily be…
1
vote
0 answers

Deep Q Learning: question about back propagation

I'm trying to create a reinforcement-learning neural network for the CartPole v0 problem from OpenAI Gym. I understand that to find the error of the neural network I must calculate the target Q-value from the Bellman equation and subtract that from…
1
vote
1 answer

Maximum Q-values in practical scenario?

Q-learning is a very simple to thing to implement and can be easily applied to explore and solve various environments or games. But as the complexity of the states increase and no. of possible actions increase, the practicality of Q-learning…
neel g
  • 1,138
  • 1
  • 11
  • 25
1
vote
2 answers

Chrome T-Rex-Game Reinforcement learning showing no improvement

I would like to create an AI for the Chrome-No-Internet-Dino-Game. Therefore I adapted this Github-Repository to fit my needs. I used the following formula to calculate the new Q: Source: https://en.wikipedia.org/wiki/Q-learning My problem now is…
1
vote
0 answers

Q-learning algorithm rewards generation

I am studying the Q-learning algorithm ( this is the tutorial I am following: https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/ ). Basically, we have some set of states ( and some walls between them ) and we need to be…
Petur Ulev
  • 97
  • 9
1
vote
0 answers

What is the best way to deal with imbalanced sample database with rewards

I look for a solution to train a DNNClassifier (4 classes, 20 numeric features) from imbalanced rewarded samples datafile. Each class represents a game action and reward the action score. Features are given observations. So it looks as QLearning…
GerardL
  • 81
  • 7
1
vote
2 answers

Deep Q Network gives same Q values and doesn't improve

I'm trying to build a deep Q network to play snake. I've run into an issue where the agent doesn't learn and its performance at the end of the training cycle is to repeatedly kill itself. After a bit of debugging, I figured out that the Q values the…
1
vote
2 answers

Q-Learning AI Isn't Recognizing Easy Pattern

I have a Q-Learning program trying to predict my stock simulated stock market where the price of the stock goes 1-2-3-1-2-3... I have been trying to debug this for a few days and just can't get it. I even completely started from scratch and the…
user10034548
1
vote
0 answers

Is it possible to play against a trained bot from the keyboard using gym (Pong)?

I've been working on training a bot(DQN) on Pong using openai-gym and torch. After some success on the PongNoFrameskip-v4 environment, it occurred to me that it would be super nice if I would be able to play against this bot after training(and…
1
vote
0 answers

Q learning, what are the states, action and reward for game of rummy?

I am working on q learning algorithm for rummy, I have to generate a Q table where it goes as Q[state, action], since in game of rummy, actions are either pick or drop i have the value set to 2 where as when it comes to states, what are the number?…
1
vote
0 answers

Unable to create observation space array with more than two dimensions

We are trying to make a Tetris AI on a raspberry pi which is connected to a board with WS2812b LEDs. We defined the action_space and observation_space like this: self.action_space = spaces.Discrete(6) self.observation_space = spaces.Discrete(width…
1
vote
0 answers

Implementing DDPG in tensorflow 2.0

https://stackoverflow.com/a/52340133/11204016 In the above answer, I'm stuck in 7th step. How can we multiply dQ/dA and dA/dTheta? dQ/dA is of dimension of batch size. dA/dTheta is of dimensions of network weights (Thetas of Q netwok).