Is it possible to configure Triton Server for serving multiple models with different input shapes in such a way that just a single "collective" (features lists union) request can service all these models (instead of multiple requests - one per every deployed model)? This would presumably have to be a JSON request, as we could no longer rely on the sequence of unnamed inputs as with numpy
arrays / tensors.
This could yield significant performance improvements in our use case due to large (90%) overlap of the features lists among the deployed models.
From the info I've collected it seems it would only be possible for a special case where all models had the same inputs (shapes, feature names). In such a case one could set up an ensemble (an extra meta-model of the platform: "ensemble"
type), redistributing input data to all deployed models in parallel (as defined in ensemble_scheduling
section of the config file).