Google Cloud Platform Kubeflow Pipeline Error

Question

I have a Google Cloud Platform account with a Kubeflow Pipeline. The first component of the pipeline preprocesses some data and the second one trains a model (SKlearn Decision Tree Classifier) with that preprocessed data. For the purpose of showing a code sample, the sample below is a simple modification of the pipeline's second component:

import logging
import pandas as pd
import os
import numpy as np
from sklearn.tree import DecisionTreeClassifier 
from sklearn import metrics, datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data
y = iris.target

x_train_data, x_test_data, y_train_data, y_test_data = train_test_split(X, y, test_size=0.3, random_state=1, shuffle=True)

print("Creating model")
model = DecisionTreeClassifier()

print(f"Training model ({type(model)})")
model.fit(x_train_data, y_train_data)

print("Evaluating model")
y_train_pred = model.predict(x_train_data)
print("y_train_pred: ", y_train_pred.shape)

y_test_pred = model.predict(x_test_data)
print("y_test_pred: ", y_test_pred.shape)    

train_accuracy = metrics.accuracy_score(y_train_data, y_train_pred)
train_classification_report = metrics.classification_report(y_train_data, y_train_pred)

print("\nTraining result:")
print(f"Accuracy:\t{train_accuracy}")
print(f"Classification report:\t{type(train_classification_report)}\n{train_classification_report}")

test_accuracy = metrics.accuracy_score(y_test_data, y_test_pred)
test_classification_report = metrics.classification_report(y_test_data, y_test_pred)

print("\nTesting result:")
print(f"Accuracy:\t{test_accuracy}")
print(f"Classification report:\t{type(test_classification_report)}\n{test_classification_report}")

print("\nDONE !\n")

Here, instead of loading the preprocessed data, I'm using the IRIS Sklearn datset but the output is exactly the same. Everything seems to work as intended, every print statement appears on the Kubeflow platform output console as expected, however after the second component finishes executing (after the last print is correclty shown on the output console), an error appers:

Traceback (most recent call last):
  File "<string>", line 181, in <module>
  File "<string>", line 151, in _serialize_str
TypeError: Value "None" has type "<class 'NoneType'>" instead of str.

Do you have any idea why this is happening ? Am I doing something wrong or is it some Google Cloud / Kubeflow Pipeline problem ?

Thanks in advance!

I tried separating the model test to a different pipeline component and apparently it solved the problem. Everything executed as expected and the error did not appear anymore. However I still did not understand what the problem actually was. If anyone has any idea, please answer anyway just for future reference. — Amorim95, Nov 20 '20 at 16:32

Google Cloud Platform Kubeflow Pipeline Error

0 Answers0