Athena displays special characters as?

Question

I have an external table with below DDL

CREATE EXTERNAL TABLE `table_1`(
  `name` string COMMENT 'from deserializer', 
  `desc1` string COMMENT 'from deserializer', 
  `desc2` string COMMENT 'from deserializer', 
  )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'quoteChar'='\"', 
  'separatorChar'='|', 
  'skip.header.line.count'='1') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://temp_loc/temp_csv/'
TBLPROPERTIES (
  'classification'='csv', 
  'compressionType'='none', 
  'typeOfData'='file')

The csv files that this table reads are UTF-16 LE encoded when trying to render the output using Athena the special characters are being displayed as question marks in the output. Is there any way to set encoding in Athena or to fix this.

Did you look into [lazySimpleSerde](https://docs.aws.amazon.com/athena/latest/ug/lazy-simple-serde.html)? It supports `serialization.encoding` which may help, also see this [post](https://stackoverflow.com/questions/36283001/hive-utf-8-encoding-number-of-characters-supported) — Philipp Johannis, Oct 02 '20 at 19:25

Theo · Answer 1 · 2020-10-05T11:08:21.417

0

The solution, as Philipp Johannis mentions in a comment, is to set the serialization.encoding table property to "UTF-16LE". As far as I can see LazySimpleSerde uses java.nio.charset.Charset.forName, so any encoding/charset name accepted by Java should work.

edited Oct 05 '20 at 11:08

answered Oct 05 '20 at 10:51

Theo

131,503
21
160
205

Athena displays special characters as?

1 Answers1

Linked