I searched for the same thing a few days ago, and since I still have not found anything more practicable and this post was again on the top my search results, I thought I share an example of the approach @Martin Zivdar already mentioned in the comments and that I have implemented for now
Sidenotes
- for simplicity I scratched preprocessing, rebalancing, .. etc.
- it is possible to log multiple metrics (or parameters) at once in a flat dictionary (see the docs)
TL;DR
Logging all performance metrics can be done with loops, here in an example for the classification_report()
# Logging all metrics in classification_report
mlflow.log_metric("accuracy", cr.pop("accuracy"))
for class_or_avg, metrics_dict in cr.items():
for metric, value in metrics_dict.items():
mlflow.log_metric(class_or_avg + '_' + metric,value)
Create Sample Data / Simulate Training
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from imblearn.over_sampling import SMOTE
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report
import mlflow
# Create example data
N = 5000
n_features = 20
X, y = make_classification(n_samples=N,
n_features=n_features,
n_clusters_per_class=1,
weights=[0.8,0.15,0.05],
flip_y=0,
random_state=1, n_classes=3)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)
# Start logging
mlflow.set_experiment("stackoverflow")
with mlflow.start_run():
# Simulate Model Training
grid_params = {
"criterion" : ["gini","log_loss"],
"min_samples_split": np.arange(2,6),
"min_samples_leaf": np.linspace(0.01,0.5, num = 3),
"ccp_alpha": np.linspace(0,3,5),
}
cv=StratifiedKFold(shuffle=True)
grid_search = GridSearchCV(DecisionTreeClassifier(),grid_params,n_jobs=3, return_train_score=False, scoring='f1_macro', verbose=1)
grid_search.fit(X_train,y_train)
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
# it is possible to log multiple params (and metrics) in a flat dictionary
mlflow.log_params(best_params)
y_pred = best_model.predict(X_test)
cr = classification_report(y_test, y_pred, output_dict=True)
cr
Output:
{'0': {'precision': 0.9461312438785504,
'recall': 0.966,
'f1-score': 0.9559623948540327,
'support': 1000},
'1': {'precision': 0.8083832335329342,
'recall': 0.7180851063829787,
'f1-score': 0.7605633802816901,
'support': 188},
'2': {'precision': 0.7903225806451613,
'recall': 0.7903225806451613,
'f1-score': 0.7903225806451614,
'support': 62},
'accuracy': 0.92,
'macro avg': {'precision': 0.8482790193522153,
'recall': 0.8248025623427133,
'f1-score': 0.835616118593628,
'support': 1250},
'weighted avg': {'precision': 0.9176858334261937,
'recall': 0.92,
'f1-score': 0.9183586482775923,
'support': 1250}}
Exemplary logging multiple metrics with MLFlow
So far so good, now to logg all metrics of the classification report one can just iterate over the nested dictionary. I manually .pop
accuracy first because this is the only non-nested entry in the dict
# Logging all metrics in classification_report
mlflow.log_metric("accuracy", cr.pop("accuracy"))
for class_or_avg, metrics_dict in cr.items():
for metric, value in metrics_dict.items():
mlflow.log_metric(class_or_avg + '_' + metric,value)