1

I don't understand how eligibility traces fit in with reinforcement learning when using radial basis functions (RBFs) to approximate the value function with continuous state variables. In particular, how do you decide which features are 'active' for a given state?

When using tile coding, or coarse coding, each tile (not each tiling) is essentially a feature and so the eligibility traces for each tile are incremented (how depends on whether you're using replacing or accumulating traces) when the state passes through each tile, and some tiles will not have their trace incremented. However, when using radial basis functions the features are the distances between the state and the centers of the Rbf network evaluated by the chosen kernel. These can be evaluated for any position of the state, and any position of the center, so there's not a clear picture of which features are activated for a given state (they can all essentially be activated to a greater or lesser degree), and so it's not clear which features should have their traces incremented.

How should one adjust eligibility traces of features generated by RBFs at each time step of a simulation?

Do I need to assume the kernels of the RBFs are truncated?

p-robot
  • 4,652
  • 2
  • 29
  • 38

0 Answers0