Operant conditioning algorithm?

Question

What's the best way to implement real time operant conditioning (supervised reward/punishment-based learning) for an agent? Should I use a neural network (and what type)? Or something else?

I want the agent to be able to be trained to follow commands like a dog. The commands would be in the form of gestures on a touchscreen. I want the agent to be able to be trained to follow a path (in continuous 2D space), make behavioral changes on command (modeled by FSM state transitions), and perform sequences of actions.

The agent would be in a simulated physical environment.

`What's the best way..` in the field of AI is seldom a good question. There are a LOT into it, and usually what fits perfectly for one problem is bad for a different one. What exactly are you trying to achieve? What is the agent exactly? What algorithm does it use? ... — amit, Nov 24 '12 at 21:00

greeness · Accepted Answer · 2012-11-24T23:00:20.190

2

Reinforcement Learning is a good machine learning algorithm for your problem.

The basic reinforcement learning model consists of:

a set of environment states S (you have a 2d space discretized in some way, which is the dog's current position, if you want to do continuous 2d-space, you might need a neural network to serve as the value function mapper.)
a set of actions A ( you mentioned the dog performs sequences of actions, e.g., move, rotate)
rules of transitioning between states ( your dog's position transition can be modeled by FSM)
rules that determine the scalar immediate reward r of a transition (When reaching the target position, you might want to give the dog a big reward, while small rewards are also welcomed at intermediate milestones)
rules that describe what the agent observes. (the dog might have a limited view, for example, only the 4 or 8 neighboring cells are viewable, below figure is an example showing the dog's current position P and the 4 neighboring cells that are viewable to the dog.)

enter image description here

To find the optimal policy, you can start with the model-free technique - q-learning.

edited Nov 24 '12 at 23:00

answered Nov 24 '12 at 21:52

greeness

15,956
5
50
80

-1 RF is not an algorithm. It's a very general problem definition. Thus it cannot be the answer to this question. – ziggystar Nov 24 '12 at 22:31
Yes, you can say RL is not an algorithm. It's a model, or problem definition. What I am trying to point out is that the OP can use the value-function approach to derive an **online algorithm** in order to train the agent to learn what are the best actions under various states. This is the core idea of RL, right? – greeness Nov 24 '12 at 23:11
Okay, in that case I should probably use an LSTM neural network. I did a little reading and came across temporal difference learning (http://en.wikipedia.org/wiki/Temporal_difference), which is a type of reinforcement learning that seems to be what I'm looking for. Thanks! – Ken Dec 02 '12 at 19:39

Operant conditioning algorithm?

1 Answers1