4

I would like to identify the returning customers from an Oracle(11g) table like this:

CustID | Date
-------|----------
XC321  | 2016-04-28
AV626  | 2016-05-18
DX970  | 2016-06-23
XC321  | 2016-05-28
XC321  | 2016-06-02

So I can see which customers returned within various windows, for example within 10, 20, 30, 40 or 50 days. For example:

CustID | 10_day | 20_day | 30_day | 40_day | 50_day 
-------|--------|--------|--------|--------|--------
XC321  |        |        |    1   |        |        
XC321  |        |        |        |    1   |        

I would even accept a result like this:

CustID |    Date    | days_from_last_visit
-------|------------|---------------------
XC321  | 2016-05-28 |                   30        
XC321  | 2016-06-02 |                    5

I guess it would use a partition by windowing clause with unbounded following and preceding clauses... but I cannot find any suitable examples. Any ideas...? Thanks

3 Answers3

2

No need for window functions here, you can simply do it with conditional aggregation using CASE EXPRESSION :

SELECT t.custID,
       COUNT(CASE WHEN (last_visit- t.date) <= 10 THEN 1 END) as 10_day,
       COUNT(CASE WHEN (last_visit- t.date) between 11 and 20 THEN 1 END) as 20_day,
       COUNT(CASE WHEN (last_visit- t.date) between 21 and 30 THEN 1 END) as 30_day,
       .....
FROM (SELECT s.custID,
             LEAD(s.date) OVER(PARTITION BY s.custID ORDER BY s.date DESC) as last_visit
      FROM YourTable s) t
GROUP BY t.custID
sagi
  • 40,026
  • 6
  • 59
  • 84
  • I guess in this case instead of `sysdate` you should use `MIN(t.date)` - first user's visit. If I got this question right. – valex Sep 06 '16 at 10:02
  • Hi Valex, I agree the, using the sysdate returns days from today, I need days from previous visit - but using MIN(t.date) returns an error: "not a single-group group function" and this would return days from the first visit not the previous visit... I guess my OP implies days from the first visit so I retract the last bit of my comment - But I would be happy with a solution for either. – user2798561 Sep 06 '16 at 10:29
  • Hi Sagi, that returns a result, but it shows the total count in the first column to match the criteria ie if a customer returns on four dates d, d+2, d+21, d+25 and D+31 then the count in the 10_day column is shown as 4 and the other columns as 0. I would expect 10_day=1, 20_day=0, 30_day=2, 40_day=1, and 50_day=0. – user2798561 Sep 06 '16 at 11:14
  • @user2798561 How 10_day column is 4 and all others 0 when `(d+2)` +10 < `d+21` ? I can imagine 3 (counting d+2; and d+25 , D+31) excluding start of seria or 5 including it (d, d+2; and d+21, d+25 and D+31 ). – Serg Sep 06 '16 at 11:27
  • So I don't understand what you are asking for. You want all the dates to be compared to the first one? Or each date to be compared to the last date? @user2798561 – sagi Sep 06 '16 at 11:31
1

Oracle Setup:

CREATE TABLE customers ( CustID, Activity_Date ) AS
SELECT 'XC321', DATE '2016-04-28' FROM DUAL UNION ALL
SELECT 'AV626', DATE '2016-05-18' FROM DUAL UNION ALL
SELECT 'DX970', DATE '2016-06-23' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-05-28' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-06-02' FROM DUAL;

Query:

SELECT *
FROM   (
  SELECT CustID,
         Activity_Date AS First_Date,
         COUNT(1) OVER ( PARTITION BY CustID
                         ORDER BY Activity_Date
                         RANGE BETWEEN CURRENT ROW AND INTERVAL '10' DAY FOLLOWING )
           - 1 AS "10_Day",
         COUNT(1) OVER ( PARTITION BY CustID
                         ORDER BY Activity_Date
                         RANGE BETWEEN CURRENT ROW AND INTERVAL '20' DAY FOLLOWING )
           - 1 AS "20_Day",
         COUNT(1) OVER ( PARTITION BY CustID
                         ORDER BY Activity_Date
                         RANGE BETWEEN CURRENT ROW AND INTERVAL '30' DAY FOLLOWING )
           - 1 AS "30_Day",
         COUNT(1) OVER ( PARTITION BY CustID
                         ORDER BY Activity_Date
                         RANGE BETWEEN CURRENT ROW AND INTERVAL '40' DAY FOLLOWING )
           - 1 AS "40_Day",
         COUNT(1) OVER ( PARTITION BY CustID
                         ORDER BY Activity_Date
                         RANGE BETWEEN CURRENT ROW AND INTERVAL '50' DAY FOLLOWING )
           - 1 AS "50_Day",
        ROW_NUMBER() OVER ( PARTITION BY CustID ORDER BY Activity_Date ) AS rn
  FROM  Customers
)
WHERE rn = 1;

Output

USTID FIRST_DATE              10_Day     20_Day     30_Day     40_Day     50_Day         RN
------ ------------------- ---------- ---------- ---------- ---------- ---------- ----------
AV626  2016-05-18 00:00:00          0          0          0          0          0          1 
DX970  2016-06-23 00:00:00          0          0          0          0          0          1 
XC321  2016-04-28 00:00:00          0          0          1          2          2          1 
MT0
  • 143,790
  • 11
  • 59
  • 117
  • Hi MT0, this gives a good result, but I am sorry my OP was not as clear as it should have been, I was expecting the results to show the recurrences 'within' each window (non-aggregated) ie 0-10_days, 11-20_days, 21-30_days etc – user2798561 Sep 06 '16 at 12:03
0

Here is an answer that works for me, I have based it on your answers above, thanks for contributions from MT0 and Sagi:

SELECT CustID,
visit_date,
Prev_Visit ,
COUNT(  CASE    WHEN (Days_between_visits) <=10    THEN 1  END) AS "0-10_day" ,
COUNT(  CASE    WHEN (Days_between_visits) BETWEEN 11 AND 20    THEN 1  END) AS "11-20_day" ,
COUNT(  CASE    WHEN (Days_between_visits) BETWEEN 21 AND 30    THEN 1  END) AS "21-30_day" ,
COUNT(  CASE    WHEN (Days_between_visits) BETWEEN 31 AND 40    THEN 1  END) AS "31-40_day" ,
COUNT(  CASE    WHEN (Days_between_visits) BETWEEN 41 AND 50    THEN 1  END) AS "41-50_day" ,
COUNT(  CASE    WHEN (Days_between_visits) >50    THEN 1  END) AS "51+_day"
FROM
  (SELECT CustID,
  visit_date,
  Lead(T1.visit_date) over (partition BY T1.CustID order by T1.visit_date DESC) AS Prev_visit,
  visit_date - Lead(T1.visit_date) over (
                  partition BY T1.CustID order by T1.visit_date DESC) AS Days_between_visits
  FROM T1
) T2
WHERE Days_between_visits >0
GROUP BY T2.CustID ,
T2.visit_date ,
T2.Prev_visit ,
T2.Days_between_visits;

This returns:

CUSTID | VISIT_DATE | PREV_VISIT | DAYS_BETWEEN_VISIT | 0-10_DAY | 11-20_DAY | 21-30_DAY | 31-40_DAY | 41-50_DAY | 51+DAY 
XC321  | 2016-05-28 | 2016-04-28 |                 30 |          |           |         1 |           |           |
XC321  | 2016-06-02 | 2016-05-28 |                  5 |       1  |           |           |           |           |