3

I just started learning mlflow and wanted to know how to pass multiple values to each parameter in the mlflow run command.

The objective is to pass a dictionary to GridSearchCV as a param_grid to perform cross validation.

In my main code, I retrieve the command line parameters using argparse. And by adding nargs='+' in the add_argument(), I can write spaced values for each hyper parameter and then applying vars() to create the dictionary. See code below:

import argparse

# Build the parameters for the command-line
param_names = list(RandomForestClassifier().get_params().keys())

# Param types in the same order they appear in param_names by using get_params()
param_types = [bool, float, dict, str, int, float, int, float, float, float,
               float, float, float, int, int, bool, int, int, bool]

# Allow for only optional command-line arguments
parser = argparse.ArgumentParser()
grid_group = parser.add_argument_group('param_grid_group')
for i, p in enumerate(param_names):
    grid_group.add_argument(f'--{p}', type=param_types[i], nargs='+')
#Create a param_grid to be passed to GridSearchCV
param_grid_unprocessed = vars(parser.parse_args())

This works well with the classic python command :

python my_code.py --max_depth 2 3 4 --n_estimators 400 600 1000

As I said, here I can just write spaced values for each hyper-parameter and the code above does the magic by grouping the values inside a list and returning the dictionary below that I can then pass to GridSearchCV :

{'max_depth':[2, 3, 4], 'n_estimators':[400, 600, 1000]}

However with the mlflow run command, I can't get it right so far as it only accepts one value for each parameter. Here's my MLproject file :

name: mlflow_project

conda_env: conda.yml

entry_points:

  main:
    parameters:
      max_depth: int
      n_estimators: int
    command: "python my_code.py --max_depth {max_depth} --n_estimators {n_estimators}"

So this works :

mlflow run . -P max_depth=2 -P n_estimators=400

But not this :

 mlflow run . -P max_depth=[2, 3, 4] -P n_estimators=[400, 600, 1000]

In the documentation, it seems that it's impossible to do it. So, is there is any hack to overcome this problem ?

Thank you in advance !

Downforu
  • 317
  • 5
  • 13

1 Answers1

0

I've been working around this issue by passing file names as parameters and loading the information from the file in my script. Not ideal but it works. I'm curious to see what others have tried.