Select statement returning bad character on Impala.
First image shows result by Hive and 2nd by Impala. It is managed table created in Hive, source table is external
Asked
Active
Viewed 1,327 times
0

Asad ch
- 47
- 1
- 7
-
Are you doing a substring or any string manipulation in your query to impala for this column? – JNevill Nov 06 '19 at 14:45
-
@JNevill no I’m just executing select * from table – Asad ch Nov 06 '19 at 14:53
-
Generally speaking impala *can* store and display unicode, but it's pretty limited (which seems kind of dumb). Essentially it treats `STRING` as a byte array and so it only recognizes single bytes where unicode characters are 2, 3, or 4 bytes. So while it should store and display unicode (by accident), it can throw up all over itself in certain circumstances. [Here](https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_string.html) it says it will trip up during: *"String manipulation functions, Comparison operators, The ORDER BY clause. Values in partition key columns."* – JNevill Nov 06 '19 at 14:58
-
@JNevill is there any work around? – Asad ch Nov 06 '19 at 15:04
1 Answers
1
I had the similar issue, and this link helped me https://community.cloudera.com/t5/Support-Questions/Hive-UTF-8-problems/td-p/172558
On insertion use something like this
insert into test select 'привет' from test limit 1;

Alex
- 11
- 1