0

I am working on a side project that is modelling a the inverted pendulum problem and solving it with a reinforcement learning algorithm, most notably Q-Learning. I have already engineered a simple MDP solver for a grid world - easy stuff.

However, I am struggling to figure out how to do this after days of scouring research papers. Nothing explains how to build up a framework for representing the problem.

When modelling the problem, can a standard Markov Decision Process be used? Or must it be a POMDP?

What is represented in each state (i.e. what state info is passed to the agent)? The coordinates, velocity, angle of the pendulum etc?

What actions can the agent take? Is it a continuous range of velocities in + or - x direction?

Advice on this is greatly appreciated.

1 Answers1

1

"Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto is the default book on reinforcement learning and they also talk about the cart pole problem (http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html). Sutton also offers the C code of the cart pole problem: http://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c Of course there are many implementations of the problem online: https://github.com/stober/cartpole

There are multiple solutions for the problem depending on hard you want to have it.

  • You can model it as a MDP or an POMDP.
  • The state can consist of position, velocity, angle and angle velocity, or any subset of these.
  • You can discretize the state space, you can use function approximation.
  • The actions can be simply min and max acceleration (discrete), something in between (discrete or continuous).

Start easy and work your way up to the more difficult problems!

Stefan
  • 1,246
  • 1
  • 9
  • 13