2

I have a function that recieves JSON data from API server and upload it to BigQuery table with specified schema. JSON data has some fields that should be STRING but contain only digits. Here is a sample of json data:

{
"realizationreport_id":5200242,
"suppliercontract_code":null,
"rrd_id":4681869162,
"gi_id":3246258,
"subject_name":"\xd0\x9a\xd0\xb0\xd0\xbb\xd1\x8c\xd1\x81\xd0\xbe\xd0\xbd\xd1\x8b",
"nm_id":44185221,          <--- serialized to INTEGER but should be STRING
"barcode":"2010739265043"  <--- serialized to INTEGER but should be STRING
........
}

Table Schema:

      bigquery.SchemaField("realizationreport_id", "INTEGER", mode="NULLABLE"),
      bigquery.SchemaField("suppliercontract_code", "STRING", mode="NULLABLE"),
      bigquery.SchemaField("rrd_id", "INTEGER", mode="REQUIRED"),
      bigquery.SchemaField("gi_id", "INTEGER", mode="NULLABLE"),
      bigquery.SchemaField("subject_name", "STRING", mode="NULLABLE"),
      bigquery.SchemaField("nm_id", "STRING", mode="NULLABLE"), #integer
      bigquery.SchemaField("barcode", "STRING", mode="NULLABLE"),#integer
      ...................

Uploading to Table

job_config = bigquery.LoadJobConfig(
                schema = wb_options.get_schema(),                
                autodetect = False
                )

loadJob = client.load_table_from_json(json_result, table, job_config=job_config)  
loadJob.result()

When JSON starts dump data it meets INTEGER value for the STRING table field

JSON parsing error in row starting at position 0: Could not convert value to string. Field: nm_id; Value: 44185221

How can I change json decode function for a particular field like nm_id to be STRING?

Should I do it somehow by adding to metadata map {default : CustomDecode} for json.dumps(metadata).encode("utf-8").

Or there is needed functionality in bigquery.LoadJobConfig?

Lena
  • 51
  • 9
  • 3
    in sample json data, barcode has double quotes whereas nm_id doesn't. Does it suit your use-case to be able to wrap nm_id in double quotes in input json so bigquery treats it as string? – gawkface Nov 17 '21 at 18:48
  • 2
    Behavior for barcode and nm_id is the same, they are both converted to NUMBER. So I guess double quotes will not help :(( This fields are set to STRING according to an documenation of the API service. – Lena Nov 17 '21 at 18:58
  • 3
    The JSON definition consider the field as a number, you ask for a string, it failed. Either add double quote around the field value to transform it in String in the JSON, or change your BigQUery schema to accept a number. – guillaume blaquiere Nov 17 '21 at 19:28
  • 1
    Thats right. But I need to change the behavior to decode integer field to string. I don't know why it should be string, perhaps for future reasons. And JSON definition consider double quoted value with only-digits as INTEGER – Lena Nov 17 '21 at 19:37
  • 1
    so what if you change the bigquery.SchemaField("nm_id", "STRING", mode="NULLABLE") to be type "INTEGER" instead? looks like bigquery converts integer to string anyway as per https://stackoverflow.com/questions/32462987/how-to-preserve-integer-data-type-when-exporting-to-json/32468017#32468017 (very old so not sure if relevant). and i understand this suggestion is not an answer but more of a contemplation of your design (which you might not want to do as part of this SO question) – gawkface Nov 17 '21 at 21:41
  • 2
    @gawkface Thanks! Seems like I have to change BigQuery schema and keep eye on this issue – Lena Nov 18 '21 at 05:51
  • 1
    The problem that numeric types could not be converted to string has been fixed. I.e. you can now load NDJSON files with values like `"nm_id":44185221,` using `nm_id:STRING` schema. – Michael Entin Dec 15 '21 at 00:00
  • @Michael Entin bigquery load_table_from_json function converts Json to NDJSON but I had have this error. Was it fixed recently? – Lena Dec 15 '21 at 17:48
  • 1
    Yes, quite recently. – Michael Entin Dec 16 '21 at 04:08

1 Answers1

1

From the LoadJobConfig reference of the BigQuery Python API, I could not find any functionality that would do exactly what you needed. The decimal_target_types was interesting, however it could not be used while also providing a table schema in the request as I tested. It worked correctly as expected when manually setting the JSON integer to strings using the documentation guide, fitting the table schema. I think the way to go is to edit the JSON decode function to automatically convert the relevant properties to strings, as this other thread explores.

ErnestoC
  • 2,660
  • 1
  • 6
  • 19