3

I'm using 'org.apache.hadoop.hive.serde2.OpenCSVSerde' to write hive table data.

CREATE TABLE testtable ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  "separatorChar" = ","
  "quoteChar"     = "'"
   )  
STORED AS TEXTFILE LOCATION '<location>' AS
select * from foo;

So, if 'foo' table has empty strings in it, for eg: '1','2','' . The empty strings are written as is to the textfile. The data in textfile reads '1','2',''

But if 'foo' contains null values, for eg: '1','2',null. The null value is not written in the text file. The data in the textfile reads '1','2',

How do I make sure that the nulls are properly written to the textfile using csv serde. Either written as empty strings or any other string say "nullstring"?

I also tried this:

CREATE TABLE testtable ROW FORMAT SERDE
....
....  
STORED AS TEXTFILE LOCATION '<location>'
TBLPROPERTIES ('serialization.null.format'='')
AS select * foo;

Though this should probably replace the empty strings with null. But this doesn't even do that.

Please guide me on how to write nulls to csv files.

Will I have to check for the null values for columns in the select query itself like (NVL or something) and replace it with something?

Punsh
  • 51
  • 2
  • 3

1 Answers1

1

Open CSV Serde ignores 'serialization.null.format' property , you can handle null values using below steps

1. CREATE TABLE testtable 
    (
    name string,
    title string,
    birth_year string
    )ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
    "separatorChar" = ","
    ,"quoteChar"     = "'"
    )
    STORED AS TEXTFILE;

2. load data into testtable

3. CREATE  TABLE testtable1
(
name string,
title string,
birth_year string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
TBLPROPERTIES('serialization.null.format'='');

4. INSERT OVERWRITE TABLE testtable1 SELECT * FROM testtable
Shalaj
  • 579
  • 8
  • 19