I have a Google Cloud Platform account with a Kubeflow Pipeline. The first component of the pipeline preprocesses some data and the second one trains a model (SKlearn Decision Tree Classifier) with that preprocessed data. For the purpose of showing a code sample, the sample below is a simple modification of the pipeline's second component:
import logging
import pandas as pd
import os
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics, datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X = iris.data
y = iris.target
x_train_data, x_test_data, y_train_data, y_test_data = train_test_split(X, y, test_size=0.3, random_state=1, shuffle=True)
print("Creating model")
model = DecisionTreeClassifier()
print(f"Training model ({type(model)})")
model.fit(x_train_data, y_train_data)
print("Evaluating model")
y_train_pred = model.predict(x_train_data)
print("y_train_pred: ", y_train_pred.shape)
y_test_pred = model.predict(x_test_data)
print("y_test_pred: ", y_test_pred.shape)
train_accuracy = metrics.accuracy_score(y_train_data, y_train_pred)
train_classification_report = metrics.classification_report(y_train_data, y_train_pred)
print("\nTraining result:")
print(f"Accuracy:\t{train_accuracy}")
print(f"Classification report:\t{type(train_classification_report)}\n{train_classification_report}")
test_accuracy = metrics.accuracy_score(y_test_data, y_test_pred)
test_classification_report = metrics.classification_report(y_test_data, y_test_pred)
print("\nTesting result:")
print(f"Accuracy:\t{test_accuracy}")
print(f"Classification report:\t{type(test_classification_report)}\n{test_classification_report}")
print("\nDONE !\n")
Here, instead of loading the preprocessed data, I'm using the IRIS Sklearn datset but the output is exactly the same. Everything seems to work as intended, every print statement appears on the Kubeflow platform output console as expected, however after the second component finishes executing (after the last print is correclty shown on the output console), an error appers:
Traceback (most recent call last):
File "<string>", line 181, in <module>
File "<string>", line 151, in _serialize_str
TypeError: Value "None" has type "<class 'NoneType'>" instead of str.
Do you have any idea why this is happening ? Am I doing something wrong or is it some Google Cloud / Kubeflow Pipeline problem ?
Thanks in advance!