0

I am using \N for null either for INTEGER or DATETIME data types in the JSON table definition(nullMarker:"\N") for external tables but then also BigQuery is not able to parse it and throwing the below error:

error message: Could not parse '\N' as int for field [Field Name](position 2) 
starting at location 0

Could anyone please help me, what can be the possible fix?

Nimantha
  • 6,405
  • 6
  • 28
  • 69
flametech
  • 21
  • 4
  • Adding to the above Q,I am facing the above issue in case of federated source – flametech Feb 01 '19 at 14:30
  • sorry, ignore my answer, you are using federated tables, which is not the same as loading data to BigQuery. Nonetheless the issue persists and null markers appear to only be valid for CSV files. – Roman Czerwinski May 24 '19 at 21:55

1 Answers1

1

I am using the BigQuery Cloud SDK CLI to attempt to do the same. I am utilizing the flag --null_marker="\N" and --source_format NEWLINE_DELIMITED_JSON to load data. Here is the error message I get:

BigQuery error in load operation: Only CSV imports may specify a null marker.

So it appears that null markers are not supported for BigQuery imports in JSON format. Pretty disappointing for me.

Here is the exact code and sample I used to test and verify this:

this is a file called example.json where I want the \N character to represent the NULL marker.

    {"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
    {"col1": "2", "col2": "\N", "col3": "4.2", "col4": "texttexttext"}
    {"col1": "3", "col2": "\N", "col3": "\N", "col4": "JUSTTEXT"}
    {"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}

this is the command to make the table (my dataset is temp14):

    bq mk temp14.null_json_test col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING

and here is the command to load the data which did not work :(

    bq load --source_format NEWLINE_DELIMITED_JSON --null_marker="\N" temp14.null_json_test ./example.json col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING

I can replace the \N in the JSON with the empty string like

    {"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
    {"col1": "2", "col2": "", "col3": "4.2", "col4": "texttexttext"}
    {"col1": "3", "col2": "", "col3": "", "col4": "JUSTTEXT"}
    {"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}

and it will load if I remove the flag:

    bq load --source_format NEWLINE_DELIMITED_JSON temp14.null_json_test ./example.json col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING

But the strings load into the table as "" instead of null and I need them to be NULL. The FLOAT field correctly assigns NULL, but the STRING does not.

My work around is to pre-process the empty strings out of my dataset in a staging area which is kind of like a scratch space, then have my final table represent that data with something like this:

    SELECT 
      col1
      , CASE WHEN col2 = '' THEN NULL ELSE col2 END as col2
      , col3
      , CASE WHEN col4 = '' THEN NULL ELSE col4 END as col4
    FROM
      temp14.null_json_test

I cannot use CSV for my data sadly and need JSON to work but also need NULL values to be represented correctly.

EDIT: After coming back to this I did discover that JSON formatted data will allow for NULLs, but not with a null marker. You simply need to omit the key/value pair for JSON data to load to BigQuery as NULL. So the example.json should look like this to nullify those above fields where the \N used to be.

    {"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
    {"col1": "2", "col3": "4.2", "col4": "texttexttext"}
    {"col1": "3", "col4": "JUSTTEXT"}
    {"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}

And I verified that the above does work with no additional flags required, as well as with external tables.

Roman Czerwinski
  • 527
  • 5
  • 16