0

I have a requirement where I need to do case-insensitive joins across the system and I don't wish to apply upper/lower functions. I tried setting TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci') at Table level but still the comparison is happening considering case sensitivity. PSB -

drop table test.caseI; create table test.caseI (name string, id int) TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci'); insert into test.caseI values ('hj',1);

drop table test.caseI_2; create table test.caseI_2 (name string, id int) TBLPROPERTIES('serialization.encoding'='utf8mb4_unicode_ci'); insert into test.caseI_2 values ('HJ',1);

select * from test.caseI i inner join test.caseI_2 i2 on i.name=i2.name; --No Result

Tried with encoding 'SQL_Latin1_General_CP1_CI_AI' but got same result as above.

Any help would be appreciated, thanks!

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
  • i am trying to understand why you need this - can you not use this `select * from test.caseI i inner join test.caseI_2 i2 on lower(i.name)=lower(i2.name);` – Koushik Roy Jun 09 '22 at 17:00
  • That can be done but the thing is applying functions in join or where clause is a big performance hit and I need to do this in almost 250 scripts running in the batch, which will make the batch perform very slow. That's why I was looking for something at the Database level. – HIMANSHU JAIN Jul 17 '22 at 16:55
  • hmm, i am doubtful if its possible in hive. – Koushik Roy Jul 18 '22 at 10:15
  • Yes, that's not. I checked that with Cloudera Team as well and they responded that as of now it isn't possible in Hive. – HIMANSHU JAIN Jul 19 '22 at 11:04

0 Answers0