4

Is there a way to set the maximum number of bad records when writing to BigqueryIO? It seems to keep the default at 0.

user2254391
  • 330
  • 2
  • 11

1 Answers1

2

At this time, unfortunately, we don't provide a way to directly set the value of configuration.load.maxBadRecords in relation to BigQueryIO in Cloud Dataflow.

As a workaround, you should be able to apply a custom ParDo transform that filters "bad records" before they are passed to BigQueryIO.Write. As a result, BigQuery shouldn't get any "bad records". Hopefully, this helps.

If the ability to control configuration.load.maxBadRecords is important to you, you are welcome to file a feature request in the issue tracker of our GitHub repository.

Davor Bonaci
  • 1,709
  • 8
  • 9
  • OK. Do you have an example of how to do the specification in .fromQuery? That seems to be a read method, and not a write (load) method. – user2254391 Aug 09 '15 at 19:47
  • @DavorBonaci could you elaborate on how one would go about filtering bad records? I asked that question here: http://stackoverflow.com/questions/35180012/validating-rows-before-inserting-into-bigquery-from-dataflow (is there a way to validate a `TableRow` against a `TableSchema`, for example?) – Theo Feb 03 '16 at 14:42
  • Ack. Let's continue the conversation there. – Davor Bonaci Feb 06 '16 at 21:33