I am trying to create an external Spectrum
table on top of plain text files but some values are considered as null because they contain special characters.
Create statement:
create external table s.table_1
(
id bigint,
city varchar(100)
)
partitioned by (file_date varchar(25))
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties ('quoteChar'='\"', 'separatorChar'=';', 'serialization.format'='1') stored as
inputformat 'org.apache.hadoop.mapred.TextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3://my_bucket/my_prefix/'
table properties ('skip.header.line.count'='1');
But the results I am getting while selecting are:
id,city
1,<null>
2,<null>
3,Erfurt
4,Erfurt
5,Lahr
Instead of the expected:
id,city
1,Mönchengladbach
2,Mönchengladbach
3,Erfurt
4,Erfurt
5,Lahr
I tried to:
- set
serdeproperties 'serialization.format'= '1'
or'UTF-8'
or'Latin-1'
or'ISO 8859-1'
- set
row format delimited fields terminated by ';'
and change theserialization.format
oftable properties
but none of the above worked. I am just copying the file and it would suit me to load it without having to transform it in advance. Any suggestions? Thank you in advance.