I'm using local kubeflow pipelines for building a continuous machine learning test project. I have one pipeline that preprocess the data using TFX, and it saves the outputs automatically to minio. Outside of this pipeline, I want to train the model using tfx's Trainer, but I need the artifacts generated in the preprocessing pipeline. Is there an implemented way to import this outputs? I've looked through the documentation and some issues, but can't find an answer. And because I'm trying to do it continuous, I can't rely on doing it manually.
Example of my preprocessing pipeline:
@kfp.dsl.pipeline(
name='TFX',
description='TFX pipeline'
)
def tfx_pipeline():
# DL with wget, can use gcs instead as well
fetch = kfp.dsl.ContainerOp(
name='download',
image='busybox',
command=['sh', '-c'],
arguments=[
'sleep 1;'
'mkdir -p /tmp/data;'
'wget <gcp link> -O /tmp/data/results.csv'],
file_outputs={'downloaded': '/tmp/data'})
records_example = tfx_csv_gen(input_base=fetch.output)
stats = tfx_statistic_gen(input_data=records_example.output)
schema_op = tfx_schema_gen(stats.output)
tfx_example_validator(stats=stats.outputs['output'], schema=schema_op.outputs['output'])
#tag::tft[]
transformed_output = tfx_transform(
input_data=records_example.output,
schema=schema_op.outputs['output'],
module_file=module_file) # Path to your TFT code on GCS/S3
#end::tft[]
and then executing with
kfp.compiler.Compiler().compile(tfx_pipeline, 'tfx_pipeline.zip')
client = kfp.Client()
client.list_experiments()
#exp = client.create_experiment(name='mdupdate')
my_experiment = client.create_experiment(name='tfx_pipeline')
my_run = client.run_pipeline(my_experiment.id, 'tfx',
'tfx_pipeline.zip')
I'm working on a .ipynb in visual studio code