0

I use python and apache beam to read streaming data from kafka and insert the data to big query table but I want to insert the data in batches instead of streaming way.

I tried to set pipeline streaming mode to True and add batch size to WriteToBigQuery method but the data was inserted into bq table in streaming mode. Also, I tried to set pipeline streaming mode to False but in the Kafka topic there is too much data to read and the pipeline got stuck. Is there any way to do this?

  • Are you using the storage write API? By default in the apache beam python SDK it uses the legacy streaming API. See https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-write-api. – Joe Moore Sep 01 '23 at 12:06

0 Answers0