Is there a way to set the maximum number of bad records when writing to BigqueryIO? It seems to keep the default at 0.
Asked
Active
Viewed 699 times
1 Answers
2
At this time, unfortunately, we don't provide a way to directly set the value of configuration.load.maxBadRecords
in relation to BigQueryIO
in Cloud Dataflow.
As a workaround, you should be able to apply a custom ParDo
transform that filters "bad records" before they are passed to BigQueryIO.Write
. As a result, BigQuery shouldn't get any "bad records". Hopefully, this helps.
If the ability to control configuration.load.maxBadRecords
is important to you, you are welcome to file a feature request in the issue tracker of our GitHub repository.

Davor Bonaci
- 1,709
- 8
- 9
-
OK. Do you have an example of how to do the specification in .fromQuery? That seems to be a read method, and not a write (load) method. – user2254391 Aug 09 '15 at 19:47
-
@DavorBonaci could you elaborate on how one would go about filtering bad records? I asked that question here: http://stackoverflow.com/questions/35180012/validating-rows-before-inserting-into-bigquery-from-dataflow (is there a way to validate a `TableRow` against a `TableSchema`, for example?) – Theo Feb 03 '16 at 14:42
-
Ack. Let's continue the conversation there. – Davor Bonaci Feb 06 '16 at 21:33