3

I am reading all files from a directory and writing to the bigquery table.

If there is an error with any file in the directory, it will raise the error and stop the job. I am not getting any information about that file (file name where error has been raised) in the log.

with beam.Pipeline(options=pipeline_options) as p:
    read_rec = p  | 'Read Files' >> ReadFromText('gs://MyBucket/MyDir/*.gz')
    read_str = read_rec | 'Map to Json' >> beam.Map(string_format)
    write_rec = read_str | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
        known_args.output,schema='string_field_0:STRING',
        createdisposition=beam.io.BigQueryDisposition.CREATE_NEVER,
        write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
    ) 

Is there any way to skip the failed file and continue to next file or at least log the filename where error has been encountered.

kalehmann
  • 4,821
  • 6
  • 26
  • 36
Shravani G
  • 31
  • 1

0 Answers0