Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions
0
votes
1 answer

DQN not converging

I am trying to implement DQN in openai-gym's "lunar lander" environment. It shows no sign of converging after 3000 episodes for training. (for comparison, a very simple policy gradient method converges after 2000 episodes) I went through my code for…
Na1ve
  • 11
  • 3
0
votes
1 answer

Enhancement of Agent Training Q Learning Taxi V3

episode_number = 10000 for i in range(1,episode_number): state = env.reset() reward_count = 0 dropouts = 0 while True: if random.uniform(0,1) < epsilon: action =…
raja
  • 1
  • 1
0
votes
0 answers

What is the purpose of the observation_space in OpenAI Gym if I am going to input the state of the environment into my DQN for training

I am having a confusion of these two terms 'observation_space' and 'state', and I do not see a purpose of even having the 'observation_space' in my code in the first place. I have seen other answers, but I dove deeper into the code of RL algorithms…
0
votes
1 answer

ValueError: Error when checking input: expected Input_input to have 4 dimensions, but got array with shape (1, 1, 2)

I am trying to create a Flappy Bird AI with Convolutional Layers and Dense Layers, but at the "Train" step (Function fit()) I get the following error message: dqn.fit(env, nb_steps=500000, visualize=False, verbose=2) Training for 500000 steps…
0
votes
2 answers

How to cast function into a struct in C?

This is my first post on StackOverflow, so I hope the format will be okay. I want to pass functions as parameter to another function. To that end, I declare a struct to describe functions. But then, I get an invalid analyser error on compilation. In…
0
votes
1 answer

How does the is_slippery parameter affect the reward in Frozenlake Environment?

How does the is_slippery parameter affect the reward in Frozenlake Environment? Frozenlake environment has a parameter named is_slippery, which if set to True will move in intended direction with probability of 1/3 else will move in either…
Anwesa Roy
  • 67
  • 1
  • 10
0
votes
1 answer

Are there benefits to having Actor and Critic use significantly different models?

In Actor-Critic methods the Actor and Critic are assigned two complimentary, but different goals. I'm trying to understand whether the differences between these goals (updating a policy and updating a value function) are large enough to warrant…
0
votes
0 answers

Reinforced Learning Example

Environment: There are 25 total turns. There are two types of actions: build CS and build CI. Goal: Find the max number of CIs (buildings) which can be built in the total number of turns given using specifically machine-learning/reinforced…
0
votes
1 answer

Learning Curve in Q-learning

My question is I wrote the Q-learning algorithm in c++ with epsilon greedy policy now I have to plot the learning curve for the Q-values. What exactly I should have to plot because I have an 11x5 Q matrix, so should I take one Q value and plot its…
Nifty
  • 67
  • 8
0
votes
1 answer

How should I code the Gambler's Problem with Q-learning (without any reinforcement learning packages)?

I would like to solve the Gambler's problem as an MDP (Markov Decision Process). Gambler's problem: A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads, he wins as many dollars as he has…
0
votes
2 answers

Python code using multiprocessing running infinitely

I am trying to execute the following code in jupyter notebook using multiprocessing but the loop is running infinitely. I need help resolving this issue. import multiprocessing as mp import numpy as np def square(x): return np.square(x) x =…
0
votes
1 answer

Helipad Co-ordinates of LunarLander v2 openai gym

I am trying to implement a custom lunar lander environment by taking help from already existing LunarLanderv2. https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py I'm having a hard time figuring out the pole co-ordinates of the…
Shan
  • 1
  • 1
0
votes
1 answer

ValueError: Model output "Tensor("activation_1/Identity:0", shape=(?, 3), dtype=float32)" has invalid shape

I am trying to run the following github code for stock market prediction: https://github.com/multidqn/deep-q-trading using their instructions, I run the following after installing the required libraries: python main.py 3 0 results_folder However,…
Aniss Chohra
  • 391
  • 4
  • 18
0
votes
0 answers

ValueError: operands could not be broadcast together with shapes - Keras

I am training an agent using demonstrations of another agent provided in the form of (state, action, reward, next_state) tuples. I am using Keras and Sklearn. This is how the q learning works: def q_learning_model(): NUM_STATES =…
marine
  • 1
  • 3
0
votes
0 answers

Any Alternative API for tf.placeholder in reinforcement learning?

i am making agent for cart pole using Q-network i am watching online lecture but he is using tensorflow v1 (it has record before tf_v2) he is using placeholder api and placeholder is removed in tensorflow-v2 i want a find alternative solution for…