I'm using Apache Beam's BigqueryIO to load into bigquery, but the load job fails with error:
"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: Field_name.",
The below is the complete configuration from the load job:
"configuration": {
"jobType": "LOAD",
"load": {
"createDisposition": "CREATE_NEVER",
"destinationTable": {
"datasetId": "people",
"projectId": "my_project",
"tableId": "beam_load_test"
},
"ignoreUnknownValues": false,
"schema": {
"fields": [
{
"mode": "NULLABLE",
"name": "First_name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Last_name",
"type": "STRING"
}
]
},
"schemaUpdateOptions": [
"ALLOW_FIELD_ADDITION"
],
"sourceFormat": "NEWLINE_DELIMITED_JSON",
"sourceUris": [
"gs://tmp_bucket/BigQueryWriteTemp/beam_load/043518a3-7bae-48ac-8068-f97430c32f58"
],
"useAvroLogicalTypes": false,
"writeDisposition": "WRITE_APPEND"
}
I can see the temp files that it has created in GSC look as they should and the schema has also been provided and is being inferred with useBeamSchema().
Here is my pipeline code that writes to BigQuery:
pipeline.apply(
"Write data to BQ",
BigQueryIO
.<GenericRecord>write()
.optimizedWrites()
.useBeamSchema()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
.withCustomGcsTempLocation(options.getGcsTempLocation())
.withNumFileShards(options.getNumShards().get())
.withMethod(FILE_LOADS)
.withTriggeringFrequency(Utils.parseDuration("10s"))
.to(new TableReference()
.setProjectId(options.getGcpProjectId().get())
.setDatasetId(options.getGcpDatasetId().get())
.setTableId(options.getGcpTableId().get()))
)
Any ideas on why new fields are not being added?