0

The two components look like this (they're YAML files compiled from Python functions):

Component 1

class DownloadData:
    @staticmethod
    def _component_func(
        mlflow_uri: str,
        run_id: str,
        artifact_path: str,
        output_dir: str,
    ) -> str:
        import os
        from pathlib import Path

        import mlflow

        if not run_id:
            print("No run ID provided. Skipping component execution.")
            return None

        # Download artifact.
        output_dir = Path(output_dir)

        print(f"Downloading object from {mlflow_uri}/{run_id}/{artifact_path}")
        output_path = mlflow.artifacts.download_artifacts(
            run_id=run_id,
            artifact_path=artifact_path,
            tracking_uri=mlflow_uri,
            dst_path=output_dir,
        )
        print(f"Done. Saved at {output_path}")

        output_file = [f for f in os.listdir(output_path) if f.endswith(".json")][0]
        output_path = os.path.join(output_path, output_file)
        print(f"Full save path: {output_path}")

        # return output_path
        return output_file

Component 2

class GetValueWithKey:
    @staticmethod
    def _component_func(
        filepath: str,
        key: str,
    ) -> str:
        import json
        import os

        if filepath.endswith(".json"):
            with open(file=filepath) as f:
                data = json.load(fp=f)
            print(f"Loaded data from {filepath}")
        else:
            raise NotImplementedError("Only JSON is implemented for now.")

        print(f"Using key {key} to get value {data[key]}")
        return data[key]

Basically, component 1 is supposed to download a JSON file from MLflow and component 2 is supposed to 1) read the JSON file and 2) return the value of a specific key.

Inside of my pipeline, I'm trying to connect the two with the following:

op_download_data = comp.load_component_from_file("download_data.yaml")
filename = op_download_data.output
yaml_get_value_with_key = comp.load_component_from_file("get_value_with_key.yaml")
op_get_value_with_key({"filepath": filename})

I'm saving my file in DownloadData to a path called /mnt/temp/data.json, but when I try to read the data I keep getting something like:

Traceback (most recent call last):
  File "/tmp/tmp.BHW6AzAKsU", line 35, in <module>
    _outputs = _component_func(**_parsed_args)
  File "/tmp/tmp.BHW6AzAKsU", line 12, in _component_func
    with open(file=filepath) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/temp/data.json'

I know that Kubeflow saves files to wherever it wants despite the user specifying a path, but how do I pass this path to a subsequent component?

Sean
  • 2,890
  • 8
  • 36
  • 78

0 Answers0