I am working with multiple character-specific voices deployed on a Triton instance. The resources are not enough to have all loaded simultaneously. Currently I manually trigger a model load/unload each time a request is received by the service.
How would you manage/schedule a large number models? Any ML tool is also welcome.