Problem
How can I dump a pickle object with its own dependencies?
The pickle object is generally generated from a notebook.
I tried creating virtualenv
for the notebook to track dependencies, however this way I don't get only the imports of the pickle object but many more that's used in other places of the application, which is fine enough but not the best solution.
Background
What I'm trying to achieve
I'm trying to build a MLOps flow. Quick explanation: MLOps is a buzzword that's synonymous with DevOps for machine learning. There are different PaaS/SaaS solutions for it offered by different companies and they commonly solve following problems:
- Automation of creating web API's from models
- Handling requirements/dependencies
- Storing & running scripts used for model generation, model binary and data sets.
I'll skip the storage part and focus on the first two.
How I'm trying to achieve
In my case I'm trying to set up this flow using good old TeamCity where models are pickle objects generated by sk-learn. The requirements are:
- The dependencies must be explicitly defined
- Other pickle objects (rather than sk-learn) must be supported.
- The workflow for a data scientists will look like:
- Data scientist uploads the pickle model with
requirements.txt
. - Data scientist commits a definition file which look like this:
apiPort: 8080 apiName: name-tagger model: model-repository.internal/model.pickle requirements: model-repository.internal/model.requirements predicterVersion: 1.0
- where predicter is a FLASK application with own
requirements.txt
. It's an API wrapper/layer of a pickle model that loads the model in the memory and serves predictions from a rest endpoint.
- Data scientist uploads the pickle model with
Then a build configuration in TeamCity parses the file and executes the following:
- Parse the definition file.
- Find the predicter code
- Copy the pickle model as model.pickle in predicter applications root folder
- Merge
requirements.txt
of predicter withrequirements.txt
of pickle model - Create virtualenv, install dependencies, push it as wheel
As output of the flow I have a package including a REST API that consumes a pickle model and exposes to the defined port.