1

I have a question about TBLProperties in Hive for OpenCSVSerde and SimpleLazySerDe.

Data file stored in text file (generated by SQOOP)

Table properties

  1. Stored data as OpenCSVSerde
  2. separatorChar by |
  3. quoteChar by "
  4. escapeChar by \\

The problem is null value are display as empty string "". Then I found this

Writing columns having NULL as some string using OpenCSVSerde - HIVE

I've try to follow on that topic, but got one issue if pipe | stored in the content then column will be shifting.

  1. Create OpenCSVSerde table

CREATE TABLE `opencsv_serde`(   
  `a` string,
  `b` string,
  `c` string
)   
ROW FORMAT SERDE    
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'  
WITH SERDEPROPERTIES (  
  'quoteChar'='\"',     
  'separatorChar'='|',
  'escapeChar'='\\'
)   
STORED AS INPUTFORMAT   
  'org.apache.hadoop.mapred.TextInputFormat'    
OUTPUTFORMAT    
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  
  1. Create LazySimpleSerDe table (I think quoteChar won't work)
CREATE TABLE `lazysimple_serde`(    
  `a` string,
  `b` string,
  `c` string
)   
ROW FORMAT SERDE    
    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES (  
  'field.delim'='|',    
  'serialization.format'='|',
  'escapeChar'='\\',
  'quoteChar'='\"'
)

STORED AS INPUTFORMAT   
  'org.apache.hadoop.mapred.TextInputFormat'    
OUTPUTFORMAT    
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  
TBLPROPERTIES('serialization.null.format'='');
  1. Insert data into opencsv_serde
insert into opencsv_serde
select "a|a", "b", '"c|c"' union all
select "d|d", "e", null;
  1. Select data from opencsv_serde result from opencsv_serde

  2. Insert data from opencsv_serde to lazysimple_serde

insert into lazysimple_serde
select * from opencsv_serde
  1. Check result from lazysimple_serde result from lazysimple_serde

Found column shifting in LazySimpleSerDe

I need to know about how to stored null in OpenCSVSerDe or prevent column shifting in LazySimpleSerDe

Thank you

0 Answers0