tritonserver: one-to-many request (scoring models with mostly overlapping feature sets)?

Asked Feb 28 '23 at 21:10

Active Mar 01 '23 at 11:50

Viewed 36 times

Is it possible to configure Triton Server for serving multiple models with different input shapes in such a way that just a single "collective" (features lists union) request can service all these models (instead of multiple requests - one per every deployed model)? This would presumably have to be a JSON request, as we could no longer rely on the sequence of unnamed inputs as with numpy arrays / tensors.

This could yield significant performance improvements in our use case due to large (90%) overlap of the features lists among the deployed models.

From the info I've collected it seems it would only be possible for a special case where all models had the same inputs (shapes, feature names). In such a case one could set up an ensemble (an extra meta-model of the platform: "ensemble" type), redistributing input data to all deployed models in parallel (as defined in ensemble_scheduling section of the config file).

edited Mar 01 '23 at 11:50

asked Feb 28 '23 at 21:10

mirekphd

4,799
3
38
59

tritonserver: one-to-many request (scoring models with mostly overlapping feature sets)?

0 Answers0