I have a Spark streaming application which trains a model and periodically stores the model to HFS. In a http based web service, I would like to POST some values and retrieve a prediction for it. The service should also reload the model on demand (e.g. via GET request).
I implemented a web server with Spark and Spray, it works for proof-of-concept. But I'm not sure if it is a good design solution. What about providing the web server to external services if it runs on a cluster? How can I define on which node the service will be available? I'm not even sure if it is the right idea to use prediction models in this way. Maybe the best-practice is to integrate Spark in a standalone application and access the model on the shared filesystem (e.g. hfs), but this will lack cluster support, wont't it?
Summary: What is the best-practice design to build a prediction web service with Apache Spark.