0

I would like to train a Proximal Policy Optimization (PPO) type model using RLlib and then serve the action distribution model using Tensorflow Lite or the equivalent PyTorch technology. I am interested in determining the ranking of actions and not just the policy.

Is there a way to extract this network from the trained policy? This tutorial shows how to train a PPO algorithm. Is there a simple mechanism to extract the desired neural network from the algo object so that I can convert it to a form for efficient serving in an AWS lambda?

Setjmp
  • 27,279
  • 27
  • 74
  • 92

0 Answers0