0

Is it possible to configure Triton Server for serving multiple models with different input shapes in such a way that just a single "collective" (features lists union) request can service all these models (instead of multiple requests - one per every deployed model)? This would presumably have to be a JSON request, as we could no longer rely on the sequence of unnamed inputs as with numpy arrays / tensors.

This could yield significant performance improvements in our use case due to large (90%) overlap of the features lists among the deployed models.

From the info I've collected it seems it would only be possible for a special case where all models had the same inputs (shapes, feature names). In such a case one could set up an ensemble (an extra meta-model of the platform: "ensemble" type), redistributing input data to all deployed models in parallel (as defined in ensemble_scheduling section of the config file).

mirekphd
  • 4,799
  • 3
  • 38
  • 59

0 Answers0