1

Table with 3 columns: order id, member id, order date

enter image description here

Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id

What I have is this:

SELECT 
  a1.member_id,
  count(distinct a1.order_id) as num_orders, 
  a1.order_date, 
  DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1 
inner join orders as a2 
  on a2.member_id = a1.member_id+1;

It's not helping me completely as the output I need is:

enter image description here

GMB
  • 216,147
  • 25
  • 84
  • 135
RashItIs
  • 87
  • 1
  • 1
  • 10
  • What is the logic to compute `num_orders`? I don't get it. – GMB Oct 23 '20 at 23:19
  • Sorry, that is count() – RashItIs Oct 23 '20 at 23:21
  • What is the logic? For example, for customer `22222`, how do you get `2` for orders `1212` and `1215`, and `1` for the two following orders? – GMB Oct 23 '20 at 23:22
  • It's order_id for an item type. A person can order 2 quantities of the same item, like 2 coffees. Order_id for coffee would be the same, example: 138 but num_orders would 2 – RashItIs Oct 24 '20 at 00:09
  • But that is not mandatory, what I want to calculate is "days since last order" column by member_id – RashItIs Oct 24 '20 at 00:10

2 Answers2

0

You can use lag() to get the date of the previous order by the same customer:

select o.*,
    datediff(
        order_date,
        lag(order_date) over(partition by member_id order by order_date, order_id)
    ) days_diff
from orders o

When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.

I just don't get the logic you want to compute num_orders.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • 1
    Thanks for sharing that, I am new to lag function so was trying self join. Your query actually helped, it works. I think I will calculate num_orders on top of this query/table – RashItIs Oct 24 '20 at 00:39
0

May be something like this:

SELECT 
  a1.member_id,
  count(distinct a1.order_id) as num_orders, 
  a1.order_date, 
  DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1 
inner join orders as a2 
  on a2.member_id = a1.member_id
where not exists (
      select intermediate_order
      from orders as intermedite_order 
      where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;