The two components look like this (they're YAML files compiled from Python functions):
Component 1
class DownloadData:
@staticmethod
def _component_func(
mlflow_uri: str,
run_id: str,
artifact_path: str,
output_dir: str,
) -> str:
import os
from pathlib import Path
import mlflow
if not run_id:
print("No run ID provided. Skipping component execution.")
return None
# Download artifact.
output_dir = Path(output_dir)
print(f"Downloading object from {mlflow_uri}/{run_id}/{artifact_path}")
output_path = mlflow.artifacts.download_artifacts(
run_id=run_id,
artifact_path=artifact_path,
tracking_uri=mlflow_uri,
dst_path=output_dir,
)
print(f"Done. Saved at {output_path}")
output_file = [f for f in os.listdir(output_path) if f.endswith(".json")][0]
output_path = os.path.join(output_path, output_file)
print(f"Full save path: {output_path}")
# return output_path
return output_file
Component 2
class GetValueWithKey:
@staticmethod
def _component_func(
filepath: str,
key: str,
) -> str:
import json
import os
if filepath.endswith(".json"):
with open(file=filepath) as f:
data = json.load(fp=f)
print(f"Loaded data from {filepath}")
else:
raise NotImplementedError("Only JSON is implemented for now.")
print(f"Using key {key} to get value {data[key]}")
return data[key]
Basically, component 1 is supposed to download a JSON file from MLflow and component 2 is supposed to 1) read the JSON file and 2) return the value of a specific key.
Inside of my pipeline, I'm trying to connect the two with the following:
op_download_data = comp.load_component_from_file("download_data.yaml")
filename = op_download_data.output
yaml_get_value_with_key = comp.load_component_from_file("get_value_with_key.yaml")
op_get_value_with_key({"filepath": filename})
I'm saving my file in DownloadData
to a path called /mnt/temp/data.json
, but when I try to read the data I keep getting something like:
Traceback (most recent call last):
File "/tmp/tmp.BHW6AzAKsU", line 35, in <module>
_outputs = _component_func(**_parsed_args)
File "/tmp/tmp.BHW6AzAKsU", line 12, in _component_func
with open(file=filepath) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/temp/data.json'
I know that Kubeflow saves files to wherever it wants despite the user specifying a path, but how do I pass this path to a subsequent component?