Reinforcement Learning methods that store Q values in a matrix (or table) are referred to as tabular RL methods. These are the most straightforward/simple approaches, but as you have discovered, not always easily applicable.
One solution you can try is to discretize your state space, create lots of "bins". For example, the hull_angle
observation can range from 0
to 2*pi
. You could, for example, map any state in which 0 < hull_angle <= 0.1
to the first bin, states with 0.1 < hull_angle < 0.2
to the second bin, etc. If there is an observation that can range from -inf
to +inf
, you can simply decide to put a threshold somewhere and treat every value beyond that threshold as the same bin (e.g. everything from -inf
to -10
maps to the same bin, everything from 10
to +inf
another bin, and then smaller areas for more bins in between).
You'd have to discretize every single one of the observations into such bins though (or simply throw some observations away), and the combination of all bin indices together would form a single index into your matrix. If you have 23
different observations, and create for example 10
bins per observation, your final matrix of Q values will have 10^23
entries, which is a... rather big number that probably doesn't fit in your memory.
A different solution is to look into different RL methods with Function Approximation. The most simple class of methods with function approximation use Linear Function Approximation, and those are the methods I'd recommend looking into first for your problem. Linear Function approximation methods essentially try to learn a linear function (a vector of weights) such that your Q-values are estimated by taking the dot product between the vector of weights and your vector of observations / features.
If you're familiar with the draft of the second edition for Sutton and Barto's Reinforcement Learning book, you'll find many such methods throughout chapters 9-12.
Another class of function approximation methods uses (deep) Neural Networks as function approximators, instead of linear functions. These may work better than linear function approximation, but are also much more complicated to understand and often require a long time to run. If you want to get the best results, they may be good to take a look at, but if you're still learning and have never seen any non-tabular RL methods yet, it's probably wise to look into simpler variants such as Linear Function Approximation first.