1
i/p:

c1                        c2                        c3
[[1,2,3],[4],[5,6]]       ['v1','v2','v3']          [['sam'], ['tam'], ['bam']] 

o/p:

c1                        c2                        c3
[1,2,3]                   'v1'                      ['sam']
[4]                       'v2'                      ['tam']
[5,6]                     'v3'                      ['bam']

Can someone please suggest how do I go about writing a query for the above problem ?

leftjoin
  • 36,950
  • 8
  • 57
  • 116

2 Answers2

2

Use posexplode():

with your_data as (
select array(array(1,2,3),array(4),array(5,6)) c1, array('v1','v2','v3') c2, array(array('sam'), array('tam'), array('bam')) c3
--returns [[1,2,3],[4],[5,6]]  ["v1","v2","v3"]  [["sam"],["tam"],["bam"]]
)

select a1.c1, a2.c2, a3.c3
  from your_data d 
       lateral view posexplode(d.c1) a1 as p1, c1
       lateral view posexplode(d.c2) a2 as p2, c2
       lateral view posexplode(d.c3) a3 as p3, c3
 where a1.p1=a2.p2 and a1.p1=a3.p3 --match positions in exploded arrays
 --without this where condition
 --lateral views will produce cartesian product
 --alternatively you can explode arrays in subqueries and join them
 --using positions, in such way you can do left-join, not only inner
 ;

Result:

OK
c1      c2      c3
[1,2,3] v1      ["sam"]
[4]     v2      ["tam"]
[5,6]   v3      ["bam"]
Time taken: 0.078 seconds, Fetched: 3 row(s)

Simplified version, thanks to @GrzegorzSkibinski for the suggestion:

with your_data as (
    select array(array(1,2,3),array(4),array(5,6)) c1, array('v1','v2','v3') c2, array(array('sam'), array('tam'), array('bam')) c3
    --returns [[1,2,3],[4],[5,6]]  ["v1","v2","v3"]  [["sam"],["tam"],["bam"]]
    )

    select a1.c1, d.c2[a1.p1] as c2,  d.c3[a1.p1] as c3
      from your_data d 
           lateral view posexplode(d.c1) a1 as p1, c1
     ;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • 1
    Thinking out loud- if you would just ```posexplode``` the first column, and then reference the remaining ones with given exploded position of the first columnlike ```d.c2[p1] as c2``` you would drop the whole ```where``` clause altogether then ;) – Grzegorz Skibinski Dec 27 '19 at 17:23
  • @GrzegorzSkibinski makes sense. And even more: do not need to explode c2 and c3 – leftjoin Dec 27 '19 at 18:26
1

Use explode:

select explode(c1) as c1 from tab;

Or use with lateral view if your use case is more complex:

select
    c1_exploded,
    a,b,c
from
    tab t
lateral view explode(t.c1) tf as c1_exploded
;

Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
  • using explode directly will not work because - 'Only a single expression in the SELECT clause is supported with UDTF's' Using later view will do a cross join kind of thing and it wont be one to one mapping ....... so on using ur 2nd approach, i'll get 3^3 rows instead of just 3 rows – pirate_shady Dec 27 '19 at 17:13
  • Ou, sorry I didn't realize ```c1```, ```c2``` and ```c3``` are different columns of the same table. I thought them being 3 different tables – Grzegorz Skibinski Dec 27 '19 at 17:16
  • Is such case ```posexplode``` indeed would be the best option. – Grzegorz Skibinski Dec 27 '19 at 17:17