I was attempting to run a TFX pipeline using BeamDagRunner where I was using Dataflow to both orchestrate pipeline and execute the tfx components. However I can't execute the components and my dataflow jobs fail saying setup.py not found. I believe what is happening is my component dataflow jobs are passed the beam pipeline arg --setup_file=/path/to/setup.py but that path doesn't exist on the orchestrator dataflow machine, only on my local. Is there a way to where I can pass that in to my component pipeline args properly? This works as expected when I orchestrate with a DirectRunner since the setup.py is found on the local path.
Small snippet:
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
from tfx.orchestration import pipeline
BeamDagRunner(
beam_orchestrator_args=[
'--setup_file=./setup.py',
'--runner=DataflowRunner'
]
).run(
pipeline.Pipeline(
...
beam_pipeline_args=[
'--setup_file=./setup.py',
'--runner=DataflowRunner'
]
)
)
This snippet should run the orchestrator on Dataflow as well as execute the components using dataflow. However the components fail saying setup.py can't be found.