I am using the BigQuery Cloud SDK CLI to attempt to do the same. I am utilizing the flag --null_marker="\N"
and --source_format NEWLINE_DELIMITED_JSON
to load data. Here is the error message I get:
BigQuery error in load operation: Only CSV imports may specify a null marker.
So it appears that null markers are not supported for BigQuery imports in JSON format. Pretty disappointing for me.
Here is the exact code and sample I used to test and verify this:
this is a file called example.json
where I want the \N
character to represent the NULL marker.
{"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
{"col1": "2", "col2": "\N", "col3": "4.2", "col4": "texttexttext"}
{"col1": "3", "col2": "\N", "col3": "\N", "col4": "JUSTTEXT"}
{"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}
this is the command to make the table (my dataset is temp14):
bq mk temp14.null_json_test col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING
and here is the command to load the data which did not work :(
bq load --source_format NEWLINE_DELIMITED_JSON --null_marker="\N" temp14.null_json_test ./example.json col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING
I can replace the \N
in the JSON with the empty string like
{"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
{"col1": "2", "col2": "", "col3": "4.2", "col4": "texttexttext"}
{"col1": "3", "col2": "", "col3": "", "col4": "JUSTTEXT"}
{"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}
and it will load if I remove the flag:
bq load --source_format NEWLINE_DELIMITED_JSON temp14.null_json_test ./example.json col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING
But the strings load into the table as "" instead of null
and I need them to be NULL. The FLOAT field correctly assigns NULL, but the STRING does not.
My work around is to pre-process the empty strings out of my dataset in a staging area which is kind of like a scratch space, then have my final table represent that data with something like this:
SELECT
col1
, CASE WHEN col2 = '' THEN NULL ELSE col2 END as col2
, col3
, CASE WHEN col4 = '' THEN NULL ELSE col4 END as col4
FROM
temp14.null_json_test
I cannot use CSV for my data sadly and need JSON to work but also need NULL values to be represented correctly.
EDIT:
After coming back to this I did discover that JSON formatted data will allow for NULLs, but not with a null marker. You simply need to omit the key/value pair for JSON data to load to BigQuery as NULL. So the example.json
should look like this to nullify those above fields where the \N
used to be.
{"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
{"col1": "2", "col3": "4.2", "col4": "texttexttext"}
{"col1": "3", "col4": "JUSTTEXT"}
{"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}
And I verified that the above does work with no additional flags required, as well as with external tables.