1

Is there a more elegant way of condensing a where statement over a range of variables with indexed names?

For example instead of:

create table table_cpt_43644
as select *
from master_table
where icd_proc_cd_1 = '43644' 
or icd_proc_cd_2 = '43644'
or icd_proc_cd_3 = '43644'
...
or icd_proc_cd_28 = '43644';

use something like the following (which alas doesn't work):

create table table_cpt43644
as select *
from master_table
where icd_proc_cd_1-icd_proc_cd_28 = '43644';
leftjoin
  • 36,950
  • 8
  • 57
  • 116
RobertF
  • 824
  • 2
  • 14
  • 40

1 Answers1

1

A bit shorter using array_contains:

where 
array_contains(
array( icd_proc_cd_1,icd_proc_cd_2,icd_proc_cd_3,icd_proc_cd_4,icd_proc_cd_5,icd_proc_cd_6,icd_proc_cd_7,icd_proc_cd_8,icd_proc_cd_9,icd_proc_cd_10,
       icd_proc_cd_11,icd_proc_cd_12,icd_proc_cd_13,icd_proc_cd_14, icd_proc_cd_15,icd_proc_cd_16,icd_proc_cd_17,icd_proc_cd_18,icd_proc_cd_19,icd_proc_cd_20, 
       icd_proc_cd_21,icd_proc_cd_22,icd_proc_cd_23,icd_proc_cd_24,icd_proc_cd_25,icd_proc_cd_26,icd_proc_cd_27,icd_proc_cd_28
     ), '43644')

If your table is based on CSV file, you can re-define table DDL, use regexSerDe and select icd_proc_cd_1-icd_proc_cd_28 as single comma-separated column. then you could use even shorter solution using array_contains(split(column_concatenated, ','),'43644'). Using rlike in this case is also possible. Though the first solution is more flexible.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thank you! Second question: If, instead of a single value '43644', I want to check if any member of the array icd_proc_cd_1-icd_proc_cd_28 contains any one of an array of values (say '43644', '43645', '43646'), is there a simple solution instead of multiple or statements? – RobertF Feb 27 '20 at 04:14
  • @RobertF Sure. Download brickhouse library https://github.com/klout/brickhouse There is intersect_array, which will return array of common elements, check the size of array_intersect – leftjoin Feb 27 '20 at 06:25
  • @RobertF after adding Jar, create temporary function : https://github.com/klout/brickhouse/blob/master/src/main/resources/brickhouse.hql and use it in the query – leftjoin Feb 27 '20 at 06:30
  • Thanks - I've never imported new functions into Hive SQL, looks tricky. Do I download the brickhouse library via commands written in the Hive editor? Or do I have to open up a Unix terminal on my Hive server? – RobertF Feb 27 '20 at 14:39
  • Ugh I'm also getting an error: 'Error while compiling statement: FAILED: SemanticException [Error 10016]: line 161:65 Argument type mismatch ''0DV60CZ'': "varchar(11)" expected at function ARRAY_CONTAINS, but "string" is found' when I run the array_contains function. – RobertF Feb 27 '20 at 15:00
  • 1
    @RobertF Try to cast argument to the same type. array_contains(array(...),cast('0DV60CZ' as varchar(11) )). Is it varchar(11) - is the type of your table column, right? – leftjoin Feb 27 '20 at 15:58
  • 1
    @RobertF Instructions how to compile is on brichouse site. You need maven and Java. Alternatively you can try to find ready compiled jar. put it in the HDFS, issue add jar from Hive, create temporary function – leftjoin Feb 27 '20 at 16:01