11

I have a view in Hive 1.1.0, based on a condition, it should return an empty array or an array of struct<name: string, jobslots: int>

Here is my code:

select
      case when <condition> 
             then array()
           else array(struct(t1.name, t1.jobslots))
       end
from table t1;

The problem here is, that the empty array array() is of type array<string>. So when I try to insert it into a table, it throws an error.

How can I change this to return an empty array of type array<struct<name: string, jobslots:int>> so that Hive's size() function returns 0 on this array?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Joha
  • 935
  • 12
  • 32

1 Answers1

0

You can do it using collect_list or colect_set for collecting an array of structs taken from a join, and a join condition is false, then collect_list will produce an empty array of struct.

This query returns array of 0 size:

select a.id, size(collect_list(b.str))=0 array_size_zero
from
(select 2 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

Result:

a.id    array_size_zero
2       true

If you change id in the first subquery a to join with b, it will return array with 1 element. And these results are of the same type, you can check it easily using union all.

Check results are of the same type:

select a.id, collect_list(b.str) my_array
from
(select 1 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

union all

select a.id, collect_list(b.str) my_array
from
(select 2 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id  

Result:

id  my_array
1   [{"name":null,"value":null}]
2   []

What happens if I try to UNION ALL empty array of struct with different type, for example array():

select 1 id, array() my_array

union all

select a.id, collect_list(b.str) my_array
from
(select 2 id ) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on a.id=b.id
group by a.id

Exception:

Error while compiling statement: FAILED: SemanticException Schema of both sides of union should match: Column my_array is of type array on first table and type array<structname:void,value:void> on second table. Cannot tell the position of null AST.

This demonstrates that first query really returns empty array of struct. You can easily do similar join in your query.

How can you use it in your query with condition? Demo:

select a.id, case when true --Put your condition here instead of dummy <true> 
                     then collect_list(a.str) --not empty
                  else collect_list(b.str) --this one is empty array of the same type
              end as my_array
from
(select 2 id, named_struct('name',null,'value',null) str) a
  left join (select 1 as id, named_struct('name',null,'value',null) as str) b
            on FALSE
group by a.id

CASE expression is quite happy and does not raise exception about incompatible types

leftjoin
  • 36,950
  • 8
  • 57
  • 116