0

I have a simulation that ticks the time every 5 seconds. I want to use OpenAI and its baselines algorithms to perform learning in this environment. For that I'd like to adapt the simulation by writing some adapter code that corresponds to the OpenAI Env API. But there is a problem: The flow of control is defined by the Agent in the OpenAI setting. But in my world, the environment steps, independent of the agent. If the agent doesn't decide or is not fast enough, the world just keeps going without him. How would one achieve this reversal of triggering the next step?

In short: OpenAI Env gets stepped by the agent. My environment gives my agent about 2-3 seconds to decide and then just tells it what's new, again offering to make choice to act or not.

As an example: My environment is rather similar to a real world stock trading market. The agent gets 24 chances to buy / sell products for a certain limit price to accumulate a certain volume for that target time and at time step 24, the reward is given to the agent and the slot is completed. The reward is based on the average price paid per item in comparison to the average price by all market participants.

At any given moment, 24 slots are traded in parallel (a 24x parallel trading of futures). I believe for this I need to create 24 environments which leads me to believe A3C would be a good choice.

pascalwhoop
  • 2,984
  • 3
  • 26
  • 40

1 Answers1

0

After re-reading the question, it seems like OpenAI gym is not a great fit for what you’re trying to do. It is designed for running rapid experiments, which cannot be done efficiently if you are waiting on live events to occur. If you have no historical data and can only train on incoming live data, there is no point to using OpenAI gym. You can write your own code to represent the environment from that data, and that would be easier than trying to morph it into another framework, although OpenAI gym’s API does provide a good model for how your environment should work.

R.F. Nelson
  • 2,254
  • 2
  • 12
  • 24
  • Yes I am leaning towards that direction. Adopting the Environment API I believe is beneficial so I can make use of the baselines algorithms as well as some other implementations that assume they are interacting with the Environment API. Is it just me or does the whole area around these tools feel a little "quick" meaning it's been coded by some geniuses without much documentation, tests or a vivid community discussing their implementations using these tools. Or maybe I just lack enough understanding ... – pascalwhoop May 19 '18 at 20:12