Any suggestions/best practices for persisting and re-using trained machine learning models ? I'm developing models in Python or R. Then these models must be used in production workflow for scoring (where R is not available). For example there could be a logistic regression model trained in R. Now new observations need to be scored against this model. The scoring engine must be fast and scalable. I've thought of following
PMML (http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language). It is easy to convert most of the models developed in R to pmml. However, I couldn't find a useful scoring engine for PMML models. For example, there is augustus (https://code.google.com/p/augustus/) but it implements only 3-4 models yet.
Serialize the models using pickle in Python and write the consumer in Python.
Any thoughts/suggestions on the right approach ?