I know how to solve gird world using policy iteration method. but how can I solve a general environment? My data is like this:
This is part of my data, it describe the transition model, please mention the source and destination type is String, I don't want to create a grid world.
I don't know how to define my state, and how to get them. Can I use a vector to store them? but how to define them is the first problem.