1

I have a dataflow job that reads from bigquery table( created on top of big table). The data flow job is created using custom template in java. I need to process around 500 million records from bigquery. The issue I am facing is even to read 1 million record big query read is taking 26 min and dataflow job is taking 36 min. The read is too slow in big query.

Any suggestions on how to improve the read performance .

Do apache beam programming model provide support to read from source in parallel ? Any IO connector available for parallel read from bigtable. Any help will be highly appreciated

Ankit Gautam
  • 125
  • 1
  • 11
  • Probably need to look at the amount of data scanned or check the query for possible optimizations. On the other hand, are you using reserved slots? Or using BigQuery on-demand? If using a reservation it might affect/limit the ability of BigQuery to scale, unless you add more slots. – Bruno Volpato Dec 06 '22 at 02:14
  • Is this still an issue? Please share the details above. – Bruno Volpato Jan 05 '23 at 02:07

0 Answers0