Questions tagged [reinforcement-learning]

Reinforcement learning is an area of machine learning and computer science concerned with how to select an action in a state that maximizes a numerical reward in a particular environment.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, Artificial Intelligence, or Computer Science instead. Otherwise you're probably off-topic.

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics--trial-and-error search and delayed reward--are the two most important distinguishing features of reinforcement learning.

From Reinforcement Learning: An Introduction

Significant Literature

External Links

Related Tags

2632 questions
148
votes
7 answers

How to train an artificial neural network to play Diablo 2 using visual input?

I'm currently trying to get an ANN to play a video game and and I was hoping to get some help from the wonderful community here. I've settled on Diablo 2. Game play is thus in real-time and from an isometric viewpoint, with the player controlling a…
146
votes
8 answers

What is the difference between Q-learning and SARSA?

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and…
Ælex
  • 14,432
  • 20
  • 88
  • 129
140
votes
5 answers

What is the difference between value iteration and policy iteration?

In reinforcement learning, what is the difference between policy iteration and value iteration? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly…
68
votes
2 answers

Training a Neural Network with Reinforcement learning

I know the basics of feedforward neural networks, and how to train them using the backpropagation algorithm, but I'm looking for an algorithm than I can use for training an ANN online with reinforcement learning. For example, the cart pole swing up…
56
votes
4 answers

What is the way to understand Proximal Policy Optimization Algorithm in RL?

I know the basics of Reinforcement Learning, but what terms it's necessary to understand to be able read arxiv PPO paper ? What is the roadmap to learn and use PPO ?
Alexander Cyberman
  • 2,114
  • 3
  • 20
  • 21
51
votes
6 answers

How can I apply reinforcement learning to continuous action spaces?

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've…
zergylord
  • 4,368
  • 5
  • 38
  • 60
46
votes
3 answers

What is a policy in reinforcement learning?

I've seen such words as: A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. But still didn't fully…
41
votes
3 answers

What is the difference between Q-learning and Value Iteration?

How is Q-learning different from value iteration in reinforcement learning? I know Q-learning is model-free and training samples are transitions (s, a, s', r). But since we know the transitions and the reward for every transition in Q-learning, is…
38
votes
1 answer

OpenAI Gym: Understanding `action_space` notation (spaces.Box)

I want to setup an RL agent on the OpenAI CarRacing-v0 environment, but before that I want to understand the action space. In the code on github line 119 says: self.action_space = spaces.Box( np.array([-1,0,0]), np.array([+1,+1,+1])) # steer, gas,…
Toke Faurby
  • 5,788
  • 9
  • 41
  • 62
35
votes
2 answers

What is the difference between reinforcement learning and deep RL?

What is the difference between deep reinforcement learning and reinforcement learning? I basically know what reinforcement learning is about, but what does the concrete term deep stand for in this context?
34
votes
5 answers

When should I use support vector machines as opposed to artificial neural networks?

I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes). But here is where I'm unclear -- do all of these claims of superiority hold for…
zergylord
  • 4,368
  • 5
  • 38
  • 60
31
votes
4 answers

Openai gym environment for multi-agent games

Is it possible to use openai's gym environments for multi-agent games? Specifically, I would like to model a card game with four players (agents). The player scoring a turn starts the next turn. How would I model the necessary coordination between…
Martin Studer
  • 2,213
  • 1
  • 18
  • 23
30
votes
2 answers

Tensorflow and Multiprocessing: Passing Sessions

I have recently been working on a project that uses a neural network for virtual robot control. I used tensorflow to code it up and it runs smoothly. So far, I used sequential simulations to evaluate how good the neural network is, however, I want…
27
votes
6 answers

NameError: name 'base' is not defined OpenAI Gym

[Note that I am using xvfb-run -s "-screen 0 1400x900x24" jupyter notebook] I try to run a basic set of commands in OpenAI Gym import gym env = gym.make("CartPole-v0") obs = env.reset() env.render() but I get the following…
midawn98
  • 401
  • 1
  • 4
  • 8
26
votes
9 answers

Good implementations of reinforcement learning?

For an ai-class project I need to implement a reinforcement learning algorithm which beats a simple game of tetris. The game is written in Java and we have the source code. I know the basics of reinforcement learning theory but was wondering if…
1
2 3
99 100