3

I have a hive table with columns id(String),val(String) as :

id,val
abc,{0|1|0}
abc,{0|1|1}
abc,{1|0|1|1}

I want to add the val column grouping by id column. Expected result is :

id,val
abc,{1|2|2|1}

This result would be got by adding the arrays in parallel.

I have tried with using lateral view explode and then casting as int and so on. But not able to get the expected result. I know using UDF is also an option, but is there any other way in hive only.

Any suggestions would be helpful.

Thanks

Pooja Nayak
  • 182
  • 1
  • 4
  • 11

3 Answers3

2

Start with replacing the {,} with blanks, split the string and use lateral view with posexplode to sum up the numbers in the same position.

select id,pos,sum(split_val) as total
from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) tbl as pos,split_val
group by id,pos

Then use collect_list to generate the final array.

select id,collect_list(total)
from (select id,pos,sum(split_val) as total
      from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) tbl as pos,split_val
      group by id,pos
     ) t
group by id
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
1

This is one possible way, there could be better ones

select * from tbl1;

+----------+------------+--+
| tbl1.id  |  tbl1.val  |
+----------+------------+--+
| abc      | {0|1|0}    |
| abc      | {0|1|1}    |
| abc      | {1|0|1|1}  |
+----------+------------+--+

Write it somewhere without {}

insert overwrite directory '/user/cloudera/tbl2' 
row format delimited fields terminated by ','
select id, substr(val,2,length(val)-2) as val2 from tbl1

Create a table to use it

create external table tbl3(id string, val array<int>)
row format delimited
fields terminated by ','
collection items terminated by '|'
location '/user/cloudera/tbl2'

+----------+------------+--+
| tbl3.id  |  tbl3.val  |
+----------+------------+--+
| abc      | [0,1,0]    |
| abc      | [0,1,1]    |
| abc      | [1,0,1,1]  |
+----------+------------+--+

Use posexplode

select id, collect_list(val) 
from (
  select id, sum(c) as val 
    from (
      select id, i, c from tbl3 
      lateral view posexplode(val) v1 as i, c 
    ) tbl 
  group by id, i
  ) tbl2 
group by id

the result

+------+------------+--+
|  id  |    _c1     |
+------+------------+--+
| abc  | [1,2,2,1]  |
+------+------------+--+
Bala
  • 11,068
  • 19
  • 67
  • 120
0

Hive table mytab :

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {0|1|0}    |
|   abc    | {0|1|1}    |
|   abc    | {1|0|1|1}  |
+----------+------------+

Expected output :

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {1|2|2|1}  |
+----------+------------+

Hive query used :

select id,concat('{',concat_ws('|',(collect_list(cast(cast(expl_val_sum as int)as string)))),'}') as coll_expl_val 
from(
select id,index,sum(expl_val) as expl_val_sum
from mytab 
lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) exp as index,expl_val
group by id,index)a
group by id;

1.First posexplode is used which explodes the array[String].
2.Then based on the index column the array values are added up parallelly.
3.Then cast as int is used to convert from decimal values to integer.
4.Then cast as String and then again converted to array[string] using collect_list.
5.Next the values of array are '|' delimited using concat_ws function.
6.Next concat function is used to append '{' and '}'.

Thanks for all your replies.

Pooja Nayak
  • 182
  • 1
  • 4
  • 11