I have a question about TBLProperties in Hive for OpenCSVSerde and SimpleLazySerDe.
Data file stored in text file (generated by SQOOP)
Table properties
- Stored data as OpenCSVSerde
- separatorChar by
|
- quoteChar by
"
- escapeChar by
\\
The problem is null
value are display as empty string ""
. Then I found this
Writing columns having NULL as some string using OpenCSVSerde - HIVE
I've try to follow on that topic, but got one issue if pipe |
stored in the content then column will be shifting.
- Create OpenCSVSerde table
CREATE TABLE `opencsv_serde`(
`a` string,
`b` string,
`c` string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'='|',
'escapeChar'='\\'
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
- Create LazySimpleSerDe table (I think quoteChar won't work)
CREATE TABLE `lazysimple_serde`(
`a` string,
`b` string,
`c` string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='|',
'serialization.format'='|',
'escapeChar'='\\',
'quoteChar'='\"'
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES('serialization.null.format'='');
- Insert data into
opencsv_serde
insert into opencsv_serde
select "a|a", "b", '"c|c"' union all
select "d|d", "e", null;
Select data from
opencsv_serde
result from opencsv_serdeInsert data from
opencsv_serde
tolazysimple_serde
insert into lazysimple_serde
select * from opencsv_serde
- Check result from
lazysimple_serde
result from lazysimple_serde
Found column shifting in LazySimpleSerDe
I need to know about how to stored null in OpenCSVSerDe or prevent column shifting in LazySimpleSerDe
Thank you