Questions tagged [q-learning]

Q-learning is a model-free reinforcement learning technique.

Q-learning is a model-free, on-policy reinforcement learning technique that aims to find an action-value function that gives the expected utility (reinforcement) of taking a given action in a given state and following a fixed policy thereafter.

One of the strengths of Q-learning is that it needs only a reinforcement function to be given (i.e. a function which tells how well, or how bad the agent is performing). During the learning process, the agent needs to balance exploitation (acting greedily in terms of current action-value function) vs exploration (action randomly to discover new states or better actions then currently estimated). A common simple example for handling this issue is using an epsilon-greedy policy.

447 questions

votes

1 answer

DQN not converging

I am trying to implement DQN in openai-gym's "lunar lander" environment. It shows no sign of converging after 3000 episodes for training. (for comparison, a very simple policy gradient method converges after 2000 episodes) I went through my code for…

asked Oct 10 '22 at 12:17

Na1ve

votes

1 answer

Enhancement of Agent Training Q Learning Taxi V3

episode_number = 10000 for i in range(1,episode_number): state = env.reset() reward_count = 0 dropouts = 0 while True: if random.uniform(0,1) < epsilon: action =…

python jupyter-notebook q-learning

asked Aug 26 '22 at 10:18

raja

votes

0 answers

What is the purpose of the observation_space in OpenAI Gym if I am going to input the state of the environment into my DQN for training

I am having a confusion of these two terms 'observation_space' and 'state', and I do not see a purpose of even having the 'observation_space' in my code in the first place. I have seen other answers, but I dove deeper into the code of RL algorithms…

deep-learning reinforcement-learning openai-gym q-learning dqn

asked Aug 12 '22 at 03:02

Zezimabig

votes

1 answer

ValueError: Error when checking input: expected Input_input to have 4 dimensions, but got array with shape (1, 1, 2)

I am trying to create a Flappy Bird AI with Convolutional Layers and Dense Layers, but at the "Train" step (Function fit()) I get the following error message: dqn.fit(env, nb_steps=500000, visualize=False, verbose=2) Training for 500000 steps…

python tensorflow conv-neural-network q-learning dqn

asked Jun 08 '22 at 07:56

chana33

votes

2 answers

How to cast function into a struct in C?

This is my first post on StackOverflow, so I hope the format will be okay. I want to pass functions as parameter to another function. To that end, I declare a struct to describe functions. But then, I get an invalid analyser error on compilation. In…

c function struct q-learning

asked May 10 '22 at 16:04

Amaury Lorin

votes

1 answer

How does the is_slippery parameter affect the reward in Frozenlake Environment?

How does the is_slippery parameter affect the reward in Frozenlake Environment? Frozenlake environment has a parameter named is_slippery, which if set to True will move in intended direction with probability of 1/3 else will move in either…

python machine-learning openai-gym q-learning

asked Apr 09 '22 at 14:12

Anwesa Roy

votes

1 answer

Are there benefits to having Actor and Critic use significantly different models?

In Actor-Critic methods the Actor and Critic are assigned two complimentary, but different goals. I'm trying to understand whether the differences between these goals (updating a policy and updating a value function) are large enough to warrant…

tensorflow keras pytorch reinforcement-learning q-learning

asked Mar 10 '22 at 05:37

Mandias

votes

0 answers

Reinforced Learning Example

Environment: There are 25 total turns. There are two types of actions: build CS and build CI. Goal: Find the max number of CIs (buildings) which can be built in the total number of turns given using specifically machine-learning/reinforced…

python machine-learning reinforcement-learning q-learning temporal-difference

asked Feb 27 '22 at 07:06

celphi

votes

1 answer

Learning Curve in Q-learning

My question is I wrote the Q-learning algorithm in c++ with epsilon greedy policy now I have to plot the learning curve for the Q-values. What exactly I should have to plot because I have an 11x5 Q matrix, so should I take one Q value and plot its…

c++ reinforcement-learning q-learning

asked Feb 04 '22 at 09:31

Nifty

votes

1 answer

How should I code the Gambler's Problem with Q-learning (without any reinforcement learning packages)?

I would like to solve the Gambler's problem as an MDP (Markov Decision Process). Gambler's problem: A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads, he wins as many dollars as he has…

python reinforcement-learning q-learning coin-flipping markov-decision-process

asked Jan 18 '22 at 10:56

Dalma Tóth-Lakits

votes

2 answers

Python code using multiprocessing running infinitely

I am trying to execute the following code in jupyter notebook using multiprocessing but the loop is running infinitely. I need help resolving this issue. import multiprocessing as mp import numpy as np def square(x): return np.square(x) x =…

python multiprocessing reinforcement-learning q-learning

asked Oct 06 '21 at 16:01

vageesh

votes

1 answer

Helipad Co-ordinates of LunarLander v2 openai gym

I am trying to implement a custom lunar lander environment by taking help from already existing LunarLanderv2. https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py I'm having a hard time figuring out the pole co-ordinates of the…

reinforcement-learning openai-gym q-learning sarsa

asked Sep 23 '21 at 06:54

Shan

votes

1 answer

ValueError: Model output "Tensor("activation_1/Identity:0", shape=(?, 3), dtype=float32)" has invalid shape

I am trying to run the following github code for stock market prediction: https://github.com/multidqn/deep-q-trading using their instructions, I run the following after installing the required libraries: python main.py 3 0 results_folder However,…

python deep-learning forecasting q-learning

asked Jul 14 '21 at 22:23

Aniss Chohra

votes

0 answers

ValueError: operands could not be broadcast together with shapes - Keras

I am training an agent using demonstrations of another agent provided in the form of (state, action, reward, next_state) tuples. I am using Keras and Sklearn. This is how the q learning works: def q_learning_model(): NUM_STATES =…

python machine-learning keras valueerror q-learning

asked Apr 30 '21 at 12:07

marine

votes

0 answers

Any Alternative API for tf.placeholder in reinforcement learning?

i am making agent for cart pole using Q-network i am watching online lecture but he is using tensorflow v1 (it has record before tf_v2) he is using placeholder api and placeholder is removed in tensorflow-v2 i want a find alternative solution for…

tensorflow reinforcement-learning q-learning

asked Apr 20 '21 at 02:32

Jeongjin Shin

Prev 1 2 3

…

29 30 Next