I have a DataStage parallel job that writes to Hive as the final stage in a long job. I can view the data that is about to be written and there are many NULL
strings that I want to see in the Hive table.
However, when I view the table that is created, there are no NULL strings, they all get converted into empty strings ''
instead. I can see other datatypes, like DECIMAL(5,0)
have NULL values and I can select these, e.g.
SELECT * FROM mytable WHERE decimal_column IS NULL;
The process for writing to Hive is to store the data in a staging table in a delimited text format. This is then pushed through a generic CDC process and results in data being written to a new partition in an ORC format table.
The only option I can see for handling NULL values is "Null Value" in the HDFS File Connector Stage. If I leave this blank then I get empty strings and if I type in 'NULL' then 'NULL' is what I get, i.e. not a NULL
, but the string 'NULL'.
I can't change the process as it's in place for literally thousands of jobs already. Is there any way to get my string values to be NULL
or am I stuck with empty strings?