I'm currently dealing with the challenge to serve my tensorflow models in a scalable way. As far as I know the recommended solution is to use the standard TensorFlow ModelServer. Common requirements are pretty well handled by this - but I want more. I want to decrease the transfered amount of data by parsing a parameter like "limit" to define the top n logits + probabilites to return.
During my research I identified the following solutions:
1) Create a more advanced SignatureDef during model building.
2) Customize the basic tensorflow/serving project with the mentioned functionality.
3) Serve the model with the standard Tensorflow Modelserver and build a postprocessing service to restructure resp. filter the result in the predefined way.
Can someone more experienced than me go into some details regarding my question? - codesnippets or links would be awesome.
Thanks in advance.