Here is my problem, I have a table in hive with 2 columns, id and an array of double values. I want to add the double values across rows for a given users. This is how the data looks like. Each array is of same length greater than 100 but for simplicity I have just given 3 here.
+--------+--------------------------
| id | value |
+--------+--------------------------+
| 1 | [0.03,0.15,-0.03] |
| 1 | [-0.2,0.11,-0.16] |
| 1 | [0.03,0.15,-0.03] |
| 2 | [0.02,0.01,0.05] |
| 2 | [0.1,0.03,0.3] |
+--------+--------------------------+
The answer I am expecting is
+--------+--------------------
| id | value |
+--------+---------------------
| 1 | [0.4,0.41,-0.22] |
| 2 | [0.12,0.04,0.35] |
+--------+---------------------
How can I do this using hive query? Thanks in advance
UPDATE: Here is what I used to get a solution. But I am looking for a better solution.
SELECT id, concat_ws(',', collect_list(CAST(val_new AS STRING))) as val_fin FROM (SELECT id, avg(valueid) as val_new from (SELECT id, valueid, index from user_interest_profiles.clicked_articles LATERAL VIEW POSEXPLODE(split(vector,'\\,')) value AS index, valueid )x GROUP BY id, index)x GROUP BY id;
The implementation i used is:
- Explode the array with index
- Average the values with group by id and index
- Concatenate the values across rows using collect_list