I'm using below function:
def final_pipeline():
with beam.Pipeline(options=pipeline_options) as pipeline:
readable_files = (
pipeline
| fileio.MatchFiles(file_pattern="gs://bucket_name/zipfile/*.csv")
| fileio.ReadMatches()
| beam.Reshuffle()
)
files_parsing = (
readable_files
| beam.ParDo(DataTransformFn())
| WriteToBigQuery(user_options.TABLE_NAME)
)
if _name_ == '_main_':
final_pipeline()
Is there a way to dynamically pass the staging bucket location as an argument to the final_pipeline()
function or to beam.Pipeline()
in Apache Beam, allowing for flexibility in specifying the source location of files for subsequent pipelines in Google Cloud Platform (GCP) Dataflow?
I don't want to use match file pattern i,e.. fileio.MatchFiles()
, because it will lead to have different dataflows for different files, instead of that I want to pass a dynamic bucket path to the final_pipeline()
function in main
which I'll fetch from composer using user_options
.
I need to make a single dataflow for different files(.csv, .txt, .asc etc), so instead of fileMatch() I need to give a dynamic path.