4

I have two columns, one of products, and one of the dates they were bought. I am able to order the dates by applying the sort_array(dates) function, but I want to be able to sort_array(products) by the purchase date. Is there a way to do that in Hive?

Tablename is

ClientID    Product    Date
100    Shampoo    2016-01-02
101    Book    2016-02-04
100    Conditioner    2015-12-31
101    Bookmark    2016-07-10
100    Cream    2016-02-12
101    Book2    2016-01-03

Then, getting one row per customer:

select
clientID,
COLLECT_LIST(Product) as Prod_List,
sort_array(COLLECT_LIST(date)) as Date_Order
from tablename
group by 1;

As:

ClientID    Prod_List    Date_Order
100    ["Shampoo","Conditioner","Cream"]    ["2015-12-31","2016-01-02","2016-02-12"]
101    ["Book","Bookmark","Book2"]    ["2016-01-03","2016-02-04","2016-07-10"]

But what I want is the order of the products to be tied to the correct chronological order of purchases.

IDK
  • 53
  • 1
  • 7

1 Answers1

5

It is possible to do it using only built-in functions, but it is not a pretty site :-)

select      clientid
           ,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date as string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') as prod_list
           ,sort_array(collect_list(date)) as date_order

from        tablename 

group by    clientid
; 

+----------+-----------------------------------+------------------------------------------+
| clientid |             prod_list             |                date_order                |
+----------+-----------------------------------+------------------------------------------+
|      100 | ["Conditioner","Shampoo","Cream"] | ["2015-12-31","2016-01-02","2016-02-12"] |
|      101 | ["Book2","Book","Bookmark"]       | ["2016-01-03","2016-02-04","2016-07-10"] |
+----------+-----------------------------------+------------------------------------------+
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88