I would like to fine-tune a model specifically to a users input (f.e stable diffusion) and save that tuned model persistently on cloud storage or s3 after - then i want to be able to (down)load it back into memory for inference anytime later - how would i go about that setup? Also are there some serverless setups to design this for autoscalability? (correct me on any level if there are wrong assumptions)
Asked
Active
Viewed 25 times
0
-
This is a complicated question. Even fine-tuning a model will require a decent amount of resources. Most serverless solutions are intended for very simple, easy tasks, so serverless probably isn't much of an option. There are plenty of autoscaling solutions that *aren't* serverless (e.g., GCP compute / AWS ec2 instances; you could limit each VM Instance to just a couple of connections, for instance). However, if you need to carefully manage a lot of expensive resources (e.g., carefully allocating user-varying GPU memory), you may need to combine a few technologies or roll your own. – Alexander Guyer Nov 01 '22 at 17:13