0

I am trying to use a private Python package as a model using the mlflow.pyfunc.PythonModel.

My conda.yaml looks like

channels:
- defaults
dependencies:
- python=3.10.4
- pip
- pip:
  - mlflow==2.1.1
  - pandas
  - --extra-index-url <private-pypa-repo-link>
  - <private-package>
name: model_env

python_env.yaml

python: 3.10.4
build_dependencies:
- pip==23.0
- setuptools==58.1.0
- wheel==0.38.4
dependencies:
- -r requirements.txt

requirements.txt

mlflow==2.1.1
pandas
--extra-index-url <private-pypa-repo-link>
<private-package>

When running the following

import mlflow
model_uri = '<run_id>'


# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(model_uri)

# Predict on a Pandas DataFrame.
import pandas as pd
t = loaded_model.predict(pd.read_json("test.json"))
print(t)

The result is

WARNING mlflow.pyfunc: Encountered an unexpected error (InvalidRequirement('Parse error at "\'--extra-\'": Expected W:(0-9A-Za-z)')) while detecting model dependency mismatches. Set logging level to DEBUG to see the full traceback.

Adding in the following before loading the mode makes it work

dep = mlflow.pyfunc.get_model_dependencies(model_uri)
print(dep)
 
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", dep])

Is there a way automatically install these dependencies rather than doing it explicitly? What are my options to get mlflow to install the private package?

Vikrant Yadav
  • 303
  • 3
  • 18

1 Answers1

1

Answering my own question here. Turns out the issue is that I was trying to use the keyring library which needs to be pre-installed and is not supported when doing inference in a virtual environment. There are ways to get around it though.

  1. Add the authentication token to the extra-index-url itself. You can find it documented in this stackoverflow question.

  2. MlFlow allows you to log any dependencies with the model itself using the code_path argument (link). Using this method, you can skip adding in your private package as a requirement. This question also touches on the same topic. The code would look a bit like this.

mlflow.pyfunc.save_model(
  path=dest_path,
  python_model=MyModel(),
  artifacts=_get_artifact_dict(t_dir),
  conda_env=conda_env,
  # Adding the current script file as dependency
  code_path=[os.path.realpath(__file__), #Add any other script]
)

Opt for first approach if saving authentication token in the requirements.txt is feasible, otherwise use the second approach. The downside of using the code_path solution is that with each model, your packages' code is getting replicated.

Vikrant Yadav
  • 303
  • 3
  • 18