I have a table having nvarchar column in SQL server 2016 that I want to store in hive. The nvarchar column can have non-ASCII characters. The data from SQL server is extracted in a file with the nvarchar column converted to base64 coded string. I tried the following to convert the base64 back to readable string:
select decode(unbase64(BASE64STR),'UTF-8');
It failed with the following error:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments ''UTF-8'': org.apache.hadoop.hive.ql.metadata.HiveException: java.nio.charset.MalformedInputException: Input length = 1
The following code is able to propertly decode BASE64 coded string
select decode(unbase64(BASE64STR),'UTF-16LE');
Is it safe to use UTF-16LE to decode the string from nvarchar type column? Will this work with any data stored in the column? Is there another way to achieve this ETL functionality from SQL Server to Hive for nvarchar type data?