Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
0
votes
1 answer

Deep Q_learning - Tensorflow - Weights won't change

I'm trying to write a DQL algorithm and I'm trying to run the following graph on tensorflow class DQN: def __init__(self, env, n_hidden, learning_rate): self.image_input = tf.placeholder(shape=[None, 128,128,3], dtype=tf.float32) …
0
votes
1 answer

Q-table representation

As far as I understand Q-learning, a Q-value is a measure of "how good" a particular state-action pair is. This is usually represented in a table in one of the following ways (see fig.): Are both representations valid? How do you determine the…
ajikodajis
  • 15
  • 4
0
votes
0 answers

Q-Learning in R: Updating a matrix based on the adjacency of two cells in another matrix using nested loops

my first question on Stack Overflow. I'm trying to update a 25x25 matrix representing states in a 5x5 grid. The rows represent current states, columns represent the next state. I'm using a formula given below to assess the adjacency of a given state…
T-Rone
  • 1
  • 3
0
votes
1 answer

Speedy Q-Learning

I've read on wikipedia https://en.wikipedia.org/wiki/Q-learning Q-learning may suffer from slow rate of convergence, especially when the discount factor {\displaystyle \gamma } \gamma is close to one.[16] Speedy Q-learning, a new variant of…
0
votes
1 answer

exploration and exploitation in Q-learning

In Q-learning algorithm, the selection of an action depends on the current state and the values of the Q-matrix. I want to know if these Q-values are updated only during the exploration step or they change also in the exploitation step.
user22
  • 13
  • 4
0
votes
0 answers

Q-learning algorithm

Good afternoon, I used q-learning to model the following problem: a set of agents have the access to 2 access points (AP) states to upload data. S={1,2} the set of states which refers to the connection to AP1 or 2. A={remain, change}. We suppose…
student26
  • 13
  • 8
0
votes
1 answer

In Q Learning, how can you ever actually get a Q value? Wouldn't Q(s,a) just go on forever?

I've been studying up on reinforcement learning, but the thing I don't understand is how a Q value is ever calculated. If you use the Bellman equation Q(s,a) = r + γ*max(Q(s',a')), would't it just go on forever? Because Q(s',a') would need the Q…
user5702166
0
votes
1 answer

iterations and reward in q-learning

Good morning, In Q-learning, the agents take actions until reaching their goal. The algorithm is executed many times until obtaining convergence. For example, the goal is to obtain a maximum throughput until the end of the time simulation. The…
student26
  • 13
  • 8
0
votes
1 answer

Pre-order exploration of tictactoe search space not generating all states

I am trying to implement q-learning for tictactoe. One of the steps in doing so involves enumerating all the possible states of the tictactoe board to form a state-value table. I have written a procedure to recursively generate all possible states…
SpiderWasp42
  • 2,526
  • 1
  • 12
  • 17
0
votes
1 answer

Reward function with a neural network approximated Q-function

In Q-learning, how should I represent my Reward function if my Q-function is approximated by a normal Feed-Forward Neural Network? Should I represent it as discrete values "near", "very near" to the goal etc.. All I'm what concerned about, is that…
0
votes
1 answer

Q-learning Updating Frequency

In Q-learning, from its current state, the agent takes action at every discrete time step and after an action is performed, an agent receives an immediate reward to access the success or failure of performed action. Let's say that we want to…
0
votes
1 answer

Javascript - Preventing Chrome From Killing Page during long loop

Chrome keeps killing the page in the middle of my connect-four browser game when it is running properly. The game is a player vs computer setup and the game itself runs properly and never crashes. The page crashes when I set the number of iterations…
Matt S
  • 1,434
  • 1
  • 15
  • 16
0
votes
1 answer

How can I improve the performance of a feedforward network as a q-value function approximator?

I'm trying to navigate an agent in a n*n gridworld domain by using Q-Learning + a feedforward neural network as a q-function approximator. Basically the agent should find the best/shortest way to reach a certain terminal goal position (+10 reward).…
0
votes
1 answer

Q value for the absorbing state

\begin{equation} ​Q_{t+1}(s_t,a_t) = Q_{t}(s_t,a_t) +\alpha (R_{t+1} + \gamma * \max(Q_t(s_{t+1}, a)) - Q_t(s_t, a_t)) \end{equation} In above equation,there is a term max(Q_t(s_{t+1},a)) Now say after you take an action in state s_t resulting in…
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
0
votes
1 answer

AI Player is not performing well? why?

I am trying to implement an agent which uses Q-learning to play Ludo. I've trained it with an e-greedy action selector, with an epsilon of 0.1, and a learning rate of 0.6, and discount factor of 0.8. I ran the game for around 50K steps, and haven't…
Lamda
  • 914
  • 3
  • 13
  • 39