2

I am trying to see if mlflow is the right place to store my metrics in the model tracking. According to the doc log_metric takes either a key value or a dict of key-values. I am wondering how to log something like below into mlflow so it can be visualized meaningfully.

          precision    recall  f1-score   support

  class1       0.89      0.98      0.93       174
  class2       0.96      0.90      0.93        30
  class3       0.96      0.90      0.93        30
  class4       1.00      1.00      1.00         7
  class5       0.93      1.00      0.96        13
  class6       1.00      0.73      0.85        15
  class7       0.95      0.97      0.96        39
  class8       0.80      0.67      0.73         6
  class9       0.97      0.86      0.91        37
 class10       0.95      0.81      0.88        26
 class11       0.50      1.00      0.67         5
 class12       0.93      0.89      0.91        28
 class13       0.73      0.84      0.78        19
 class14       1.00      1.00      1.00         6
 class15       0.45      0.83      0.59         6
 class16       0.97      0.98      0.97       245
 class17       0.93      0.86      0.89       206

accuracy                           0.92       892

macro avg 0.88 0.90 0.88 892 weighted avg 0.93 0.92 0.92 892

Felix Gao
  • 41
  • 3
  • `log_metric` is used to log a metric over time, metrics like loss, cumulative reward (for reinforcement learning) and so on. The output is a linear plot that shows metric changes over time/steps. If numbers in front of the classes are used to show the step, then you should call `mlflow.log_metric("class_precision", precision, step=COUNTER)` over each row. Else, I think if you want to plot this table meaningfully, you should plot it manually. – Matin Zivdar Mar 19 '22 at 13:55

1 Answers1

1

I searched for the same thing a few days ago, and since I still have not found anything more practicable and this post was again on the top my search results, I thought I share an example of the approach @Martin Zivdar already mentioned in the comments and that I have implemented for now

Sidenotes

  • for simplicity I scratched preprocessing, rebalancing, .. etc.
  • it is possible to log multiple metrics (or parameters) at once in a flat dictionary (see the docs)

TL;DR

Logging all performance metrics can be done with loops, here in an example for the classification_report()

# Logging all metrics in classification_report
mlflow.log_metric("accuracy", cr.pop("accuracy"))
for class_or_avg, metrics_dict in cr.items():
    for metric, value in metrics_dict.items():
        mlflow.log_metric(class_or_avg + '_' + metric,value)

Create Sample Data / Simulate Training

import pandas as pd
import numpy as np

from sklearn.tree import DecisionTreeClassifier
from imblearn.over_sampling import SMOTE
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report
import mlflow

# Create example data
N = 5000
n_features = 20
X, y = make_classification(n_samples=N,
                           n_features=n_features,
                           n_clusters_per_class=1,
                           weights=[0.8,0.15,0.05],
                           flip_y=0,
                           random_state=1, n_classes=3)

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

# Start logging
mlflow.set_experiment("stackoverflow")
with mlflow.start_run():
    # Simulate Model Training
    grid_params = {
        "criterion" : ["gini","log_loss"],
        "min_samples_split": np.arange(2,6),
        "min_samples_leaf": np.linspace(0.01,0.5, num = 3),
        "ccp_alpha": np.linspace(0,3,5),
    }
    cv=StratifiedKFold(shuffle=True)
    grid_search = GridSearchCV(DecisionTreeClassifier(),grid_params,n_jobs=3, return_train_score=False, scoring='f1_macro', verbose=1)
    grid_search.fit(X_train,y_train)

    best_model = grid_search.best_estimator_
    best_params = grid_search.best_params_
    
    # it is possible to log multiple params (and metrics) in a flat dictionary
    mlflow.log_params(best_params)
    y_pred = best_model.predict(X_test)
    cr = classification_report(y_test, y_pred, output_dict=True)
    cr

Output:

{'0': {'precision': 0.9461312438785504,
  'recall': 0.966,
  'f1-score': 0.9559623948540327,
  'support': 1000},
 '1': {'precision': 0.8083832335329342,
  'recall': 0.7180851063829787,
  'f1-score': 0.7605633802816901,
  'support': 188},
 '2': {'precision': 0.7903225806451613,
  'recall': 0.7903225806451613,
  'f1-score': 0.7903225806451614,
  'support': 62},
 'accuracy': 0.92,
 'macro avg': {'precision': 0.8482790193522153,
  'recall': 0.8248025623427133,
  'f1-score': 0.835616118593628,
  'support': 1250},
 'weighted avg': {'precision': 0.9176858334261937,
  'recall': 0.92,
  'f1-score': 0.9183586482775923,
  'support': 1250}}

Exemplary logging multiple metrics with MLFlow

So far so good, now to logg all metrics of the classification report one can just iterate over the nested dictionary. I manually .pop accuracy first because this is the only non-nested entry in the dict

    # Logging all metrics in classification_report
    mlflow.log_metric("accuracy", cr.pop("accuracy"))
    for class_or_avg, metrics_dict in cr.items():
        for metric, value in metrics_dict.items():
            mlflow.log_metric(class_or_avg + '_' + metric,value)
Björn
  • 1,610
  • 2
  • 17
  • 37