4

I am facing issues where i have multiple files with different charsets, say one file has Chinese charsets and other has French Charsets, how can i load them in a single hive table? I searched online and found this :-

ALTER TABLE mytable SET SERDEPROPERTIES ('serialization.encoding'='SJIS');

With this i can handle charsets for one of the file either Chinese or French. Is there a way to handle both charsets once?

[UPDATE]

Okay i am using RegexSerde for fixed width file alongside encoding scheme being used is - ISO 8859-1. Seems Regex Serde is not taking this encoding scheme into account and splitting the characters considering default UTF-8 encoding scheme. Is there a way to take encoding scheme into account with Regex serde.

Paritosh Ahuja
  • 1,239
  • 2
  • 10
  • 19

1 Answers1

4

I am not sure if this is possible (i think it isn't based on https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java). A workaround could be create two tables with different enconding and create a view on top of that.

hlagos
  • 7,690
  • 3
  • 23
  • 41