Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
1
vote
1 answer

Q-Learning optimisation with overlapping states

I am implementing Q-learning for a simple task, which involves a robot moving to a target position, in a continuous coordinate system. Each episode has a fixed length, and the rewards are sparse: there is a single reward given to the final…
Karnivaurus
  • 22,823
  • 57
  • 147
  • 247
1
vote
2 answers

Reward function for learning to play Curve Fever game with DQN

I've made a simple version of Curve Fever also known as "Achtung Die Kurve". I want the machine to figure out how to play the game optimally. I copied and slightly modified an existing DQN from some Atari game examples that is made with Google's…
1
vote
1 answer

Different rewards for same state in reinforcement learning

I want to implement Q-Learning for the Chrome dinosaur game (the one you can play when you are offline). I defined my state as: distance to next obstacle, speed and the size of the next obstacle. For the reward I wanted to use the number of…
1
vote
0 answers

Q-Values in DQN are getting too big

I have already checked this question and confirmed this is not a duplicate issue. Problem: I have implemented an agent that uses a DQN with TensorFlow to learn the optimal policy of a game called 'dots and boxes'. The algorithm appears to actually…
1
vote
0 answers

How should I choose Keras parameters for grid exploration?

I am trying to train a neural network to efficiently explore a grid to locate an object using Keras and Keras-RL. Every "step", the agent chooses a direction to explore by choosing a number from 0 to 8, where each corresponds to a cardinal or…
Harrison Grodin
  • 2,253
  • 2
  • 19
  • 30
1
vote
1 answer

ϵ-greedy policy with decreasing rate of exploration

I want to implement ϵ-greedy policy action-selection policy in Q-learning. Here many people have used, following equation for decreasing rate of exploration, ɛ = e^(-En) n = the age of the agent E = exploitation parameter But I am not clear what…
D_Wills
  • 345
  • 1
  • 3
  • 14
1
vote
1 answer

Sequence with the max score?

let say I have n-states S={s1,s2,s3, ..... sn } and I have a score for every transition i.e. T-matrix f.e. s1->s5 = 0.3, s4->s3 = 0.7, ....etc. What algorithm or procedure should I use to select the best scored sequence/path starting from state-x…
sten
  • 7,028
  • 9
  • 41
  • 63
1
vote
2 answers

Why doesn't my neural network Q-learner doesn't learn tic-tac-toe

Okay, so I have created a neural network Q-learner using the same idea as DeepMind's Atari algorithm (except I give raw data not pictures (yet)). Neural network build: 9 inputs (0 for empty spot, 1 for "X", -1 for "O") 1 hidden layer with 9-50…
1
vote
1 answer

Pybrain reinforcement learning; dimension of state

I am working on a project to combine reinforcement learning with traffic light simulations using the package Pybrain. I have read the tutorial and implemented my own subclasses of Environment and Task. I am using an ActionValueNetwork as controller…
1
vote
1 answer

Why does Q-learning work in an unknown environment?

Q-learning uses instant reward matrix R to model an environment. That means it uses a known matrix R for learning, So why do people say "Q-learning can work in an unknown environment"?
1
vote
1 answer

Programmaticaly find next state for max(Q(s',a')) in q-learning using R

I am writing a simple grid world q-learning program using R. This is my grid world This simple grid world has 6 states in which state 1 and state 6 are starting and ending state. I avoided adding a fire pit, wall, wind so to keep my grid world as…
Eka
  • 14,170
  • 38
  • 128
  • 212
1
vote
1 answer

Can Q-Learning algorithm become overtrained?

It has been proved that the Q-Learning algorithm converges to the Qs of the optimal policy which are unique. So is it correct to conclude that the Q-Learning algorithm cannot become overtrained?
1
vote
1 answer

Q-learning with function approximation where each state doesn't have same set of actions

I am applying Q-learning with function approximation to a problem where each state doesn't have same set of actions. There when i am calculating target Target = R(s,a,s') + (max_a' * Q(s',a')) As each state does not have same set of actions so…
Prabir
  • 75
  • 8
1
vote
0 answers

How can I choose the features my q-learning with linear function approximation

I am developing AI using reinforcement-learning. It is a game that player should avoid bricks falling from sky. There are 20 bricks falling to the ground. game screen shot , game play video link I implemented AI using reinforcement-learning with…
1
vote
1 answer

How do you normalize weights q-learning with linear function approximation

I am developing simple game program to show q-learning with linear function approximation. screen shot In this game, there are uncountable state. I have to consider many factors like player's position, speed, and enemy's position (there are 12 ~ 15…
Juho Sung
  • 59
  • 5