0

Here is my problem, I have a table in hive with 2 columns, id and an array of double values. I want to add the double values across rows for a given users. This is how the data looks like. Each array is of same length greater than 100 but for simplicity I have just given 3 here.

+--------+--------------------------
| id  |         value         | 
+--------+--------------------------+
| 1      | [0.03,0.15,-0.03]  |
| 1      | [-0.2,0.11,-0.16]  |
| 1      | [0.03,0.15,-0.03]  | 
| 2      | [0.02,0.01,0.05]   | 
| 2      | [0.1,0.03,0.3]     | 
+--------+--------------------------+

The answer I am expecting is

+--------+--------------------
| id  |         value         | 
+--------+---------------------
| 1      | [0.4,0.41,-0.22]   |
| 2      | [0.12,0.04,0.35]   | 
+--------+---------------------

How can I do this using hive query? Thanks in advance

UPDATE: Here is what I used to get a solution. But I am looking for a better solution.

SELECT id, concat_ws(',', collect_list(CAST(val_new AS STRING))) as val_fin FROM (SELECT id, avg(valueid) as val_new from (SELECT id, valueid, index from user_interest_profiles.clicked_articles LATERAL VIEW POSEXPLODE(split(vector,'\\,')) value AS index, valueid )x GROUP BY id, index)x GROUP BY id;

The implementation i used is:

  • Explode the array with index
  • Average the values with group by id and index
  • Concatenate the values across rows using collect_list
abhijit
  • 1
  • 3

0 Answers0