I'm trying to load data from GCS bucket and publish content to pubsub and bigquery. These are my pipeline options:
options = PipelineOptions(
project = project,
temp_location = "gs://dataflow-example-bucket6721/temp21/",
region = 'us-east1',
job_name = "dataflow2-pubsub-09072021",
machine_type = 'e2-standard-2',
)
And this is my pipeline
data = p | 'CreateData' >> beam.Create(sum([fileName()], []))
jsonFile = data | "filterJson" >> beam.Filter(filterJsonfile)
JsonData = jsonFile | "JsonData" >> beam.Map(readFromJson)
split_data = JsonData | 'Split Data' >> ParDo(CheckForValidData()).with_outputs("ValidData", "InvalidData")
ValidData = split_data.ValidData
InvalidData = split_data.InvalidData
data_ = split_data[None]
publish_data = ValidData | "Publish msg" >> ParDo(publishMsg())
ToBQ = ValidData | "To BQ" >> beam.io.WriteToBigQuery(
table_spec,
#schema=table_schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
The data is flowing fine in InteractiveRunner but in DataflowRunner it is showing an error like
ValueError: Invalid GCS location: None. Writing to BigQuery with FILE_LOADS method requires a GCS location to be provided to write files to be loaded into BigQuery. Please provide a GCS bucket through custom_gcs_temp_location in the constructor of WriteToBigQuery or the fallback option --temp_location, or pass method="STREAMING_INSERTS" to WriteToBigQuery. [while running '[15]: To BQ/BigQueryBatchFileLoads/GenerateFilePrefix']
It is showing error of GCS location and suggest to add temp_location. but I have already added temp_location.