I was working a proof of concept of having Spark Mllib training and prediction serving exposed for multiple tenants with some form of a REST interface. I did get a POC up and running but it seems a bit wasteful as it has to create numerous spark contexts and JVMs to execute in so I was wondering if there is a way around that or a cleaner solution having in mind spark's context per jvm restrictions.
There are 2 parts to it:
Trigger training of a specified jar per tenant with specific restrictions for each tenant like executor size etc. (this is pretty much out of the box with spark job server, sadly it doesnt yet seem to support Oauth), but there is a way to do it. For this part I don't think it's possible to share context between tenants because they should be able to train in parallel and as far as i know an MLlib context will do 2 training requests sequentially.
This is trickier and I can't seem to find a good way to do that, but once the model has been trained we need to load it in some kind of a REST service and expose it. This also means allocating a spark context per tenant, hence a full JVM per tenant serving predictions, which is quite wasteful.
Any feedback on how this can possibly be improved or re-architected so it's a bit less resource hungry, maybe there are certain Spark features I'm not aware of that would facilitate that. Thanks!