5

I need to run a batch job from GCS to BigQuery via Dataflow and Beam. All my files are avro with the same schema. I've created a dataflow java application that is successful on a smaller set of data (~1gb, about 5 files). But when I try to run it on a bigger set of data ( >500gb, >1000 files), i receive an error message

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to create load job with id prefix 1b83679a4f5d48c5b45ff20b2b822728_6e48345728d4da6cb51353f0dc550c1b_00001_00000, reached max retries: 3, last failed load job: ...

After 3 retries it terminates with:

Workflow failed. Causes: S57....... A work item was attempted 4 times without success....

This step is the load to BigQuery.

Stack Driver says the processing is stuck in step ....for 10m00s... and

Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes.....

I looked up the 409 error code stating that I might have an existing job, dataset, or table. I've removed all the tables and re-ran the application but it still shows the same error message.

I am currently limited on 65 workers and I have them using n1-standard-4 cpus.

I believe there are other ways to move the data from gcs to bq, but i need to demonstrate dataflow.

enle lin
  • 1,664
  • 8
  • 14
andrew
  • 51
  • 1
  • 4
  • Are you using destination table in the BigQuery side? What disposition for create and write are you specified? – enle lin Apr 20 '18 at 10:48
  • The destination table has been set. If I run the dataflow with a smaller batch (~1gb) it will load just fine. the disposition is: .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)); – andrew Apr 23 '18 at 22:05
  • It is strange... Could you try to run the job with large dataset first and then run it again with the small dataset? – enle lin Apr 26 '18 at 14:18
  • I am also getting same error while loading large data. Are you able to resolve this? Pls let me know, i am still looking for the solution. Thanks ! – Akash5288 Mar 16 '19 at 20:23
  • 3
    @andrew long shot, but are you possibly including restricted characters in your table name? I got the same vague error message and what it turned out to be was the inclusion of hyphens in my table name. – DeaconDesperado Aug 29 '19 at 16:56
  • ^ This was exactly it @andrew. Thank you – Colin Schoen May 01 '20 at 17:47

3 Answers3

1

"java.lang.RuntimeException: Failed to create job with prefix beam_load_csvtobigqueryxxxxxxxxxxxxxx, reached max retries: 3, last failed job: null. at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:198)..... "

  • One of the possible cause could be the privilege issue. Ensure the user account which interacts with the BigQuery has privilege "bigquery.jobs.create" in the predefined role "*BigQuery User"
Muthu
  • 21
  • 1
1

Posting the comment of @DeaconDesperado as community wiki, where they experienced the same error and what they did was remove the restricted characters (eg. Unicode letters, marks, numbers, connectors, dashes or spaces) in the table name and the error is gone.

Ricco D
  • 6,873
  • 1
  • 8
  • 18
0

I got the same problem using "roles/bigquery.jobUser", "roles/bigquery.dataViewer", and "roles/bigquery.user". But only when granting "roles/bigquery.admin" did the issue get resolved.

Tor Hovland
  • 1,629
  • 2
  • 15
  • 23