2

I'm using Apache Beam's BigqueryIO to load into bigquery, but the load job fails with error:

"message": "Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: Field_name.",

The below is the complete configuration from the load job:

      "configuration": {
    "jobType": "LOAD",
    "load": {
      "createDisposition": "CREATE_NEVER",
      "destinationTable": {
        "datasetId": "people",
        "projectId": "my_project",
        "tableId": "beam_load_test"
      },
      "ignoreUnknownValues": false,
      "schema": {
        "fields": [
          {
            "mode": "NULLABLE",
            "name": "First_name",
            "type": "STRING"
          },
          {
            "mode": "NULLABLE",
            "name": "Last_name",
            "type": "STRING"
          }
        ]
      },
      "schemaUpdateOptions": [
        "ALLOW_FIELD_ADDITION"
      ],
      "sourceFormat": "NEWLINE_DELIMITED_JSON",
      "sourceUris": [
        "gs://tmp_bucket/BigQueryWriteTemp/beam_load/043518a3-7bae-48ac-8068-f97430c32f58"
      ],
      "useAvroLogicalTypes": false,
      "writeDisposition": "WRITE_APPEND"
    }
  

I can see the temp files that it has created in GSC look as they should and the schema has also been provided and is being inferred with useBeamSchema().

Here is my pipeline code that writes to BigQuery:

pipeline.apply(
            "Write data to BQ",
            BigQueryIO
                    .<GenericRecord>write()
                    .optimizedWrites()
                    .useBeamSchema()
                    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
                    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
                    .withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
                    .withCustomGcsTempLocation(options.getGcsTempLocation())
                    .withNumFileShards(options.getNumShards().get())
                    .withMethod(FILE_LOADS)
                    .withTriggeringFrequency(Utils.parseDuration("10s"))
                    .to(new TableReference()
                            .setProjectId(options.getGcpProjectId().get())
                            .setDatasetId(options.getGcpDatasetId().get())
                            .setTableId(options.getGcpTableId().get()))
    )

Any ideas on why new fields are not being added?

artofdoe
  • 167
  • 2
  • 14
  • Can you share the related pipeline code, expanding `BigqueryIO` class body? – Nick_Kh Jul 02 '20 at 11:39
  • @mk_sta, i've added the pipeline code that writes to bigquery – artofdoe Jul 02 '20 at 12:45
  • Did you define Field_name? `If you specify the schema in a JSON file, the new columns must be defined in it. If the new column definitions are missing, the following error is returned when you attempt to append the data: Error while reading data, error message: parsing error in row starting at position int: No such field: field.` https://cloud.google.com/bigquery/docs/managing-table-schemas#adding_a_column_in_a_load_append_job – Peter Kim Jul 02 '20 at 20:18
  • As long as you are loading data over `jobs.insert` method, the solution from @Peter Kim sounds to me reasonable. Did you specify a [schema](https://cloud.google.com/bigquery/docs/schemas#specifying_a_json_schema_file) in the input file? – Nick_Kh Jul 06 '20 at 07:31
  • @artofdoe , did you solve your problem !? – Pascal GILLET Jun 07 '21 at 17:04

0 Answers0