0

Using Oracle SQL, I’m trying to calculate total unique visits to a website. The table I’m using to write the query does not have a timestamp which includes minutes and seconds just DDMMYY and every row in the table represents a customer click on the page. The table designates a new “session” every hour, regardless of whether that actually reflects a new visit from the customer’s POV. What I must do is use non-consecutive sessions as a proxy for unique visits. So, if there is an hour break between visits the previous consecutive grouping is one visit. I define a visit as a unique combination of customer ID + session day + session hour. If there are consecutive session hours within a customer + day combination, I count that as a single session. The HOUR filed contains string values that concatenate date with hour. In order to do the appropriate visit count calculation, I will need to parse out the hour and subtract from the previous (lag) row in order to determine if there is greater than an hour “break”.

Example of Raw Data:
TRANS_TO_DATE   CUSTOMER_ID HOUR
10/21/17        1007589445  October 21, 2017, Hour 1
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 3
10/21/17        1007589445  October 21, 2017, Hour 5
10/21/17        1007589445  October 21, 2017, Hour 6
10/21/17        1007589445  October 21, 2017, Hour 23
10/21/17        1007589445  October 21, 2017, Hour 23
10/21/17        1007589445  October 21, 2017, Hour 23
11/1/17         1007589445  November 1, 2017, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 11
1/1/18          1007589445  January  1, 2018, Hour 14
1/1/18          1007589445  January  1, 2018, Hour 20
1/1/18          1007589445  January  1, 2018, Hour 22

The visit count is actually this:

Customer_id Day Hour    Visit Grouping 
1007589445  October 21, 2017    1   Visit 1
1007589445  October 21, 2017    2   Visit 1
1007589445  October 21, 2017    3   Visit 1
1007589445  October 21, 2017    5   Visit 2
1007589445  October 21, 2017    6   Visit 2
1007589445  October 21, 2017    23  Visit 3
1007589445  November 1, 2017    10  Visit 1
1007589445  January 1, 2018 10  Visit 1
1007589445  January 1, 2018 11  Visit 1
1007589445  January 1, 2018 14  Visit 2
1007589445  January 1, 2018 20  Visit 3
1007589445  January 1, 2018 21  Visit 4

Customer 1007589445 had

3 visits on October 21, 2017 - 1 visit on November 1, 2017 - 4 visits on January 1, 2018

Total visits: 8

Below is the sql code I have so far which needs to be modifide to satisfy the critera above.

select 
CUSTOMER_ID, 
TRANS_TO_DATE,
HOUR,
count (HOUR) as visits
from mstr_clickstream_vw 
where trans_to_date between start_date and end_date
and web_store_ind='US'
 group by CUSTOMER_ID, TRANS_TO_DATE,HOUR
user2270911
  • 195
  • 1
  • 5
  • 18

1 Answers1

0

You can get the hour with:

cast(trim(substr(hour, -2)) as int)

Then to use this to assign sessions by using lag() and a cumulative conditional aggregation:

select cs.*,
       sum(case when trans_to_date = prev_ttd and prev_hh = hh then 0
                when trans_to_date = prev_ttd and prev_hh = hh - 1 then 0
                when hh = 0 and prev_hh = 23 and trans_to_date = prev_ttd + interval '1' day then 0
                else 1
           end) over (partition by customer_id order by trans_to_date, hh) as grouping
from (select cs.*,
             lag(trans_to_date) over (partition by customer_id order by trans_to_date, hh) as prev_ttd,
             lag(hh) over (partition by customer_id order by trans_to_date, hh) as prev_hh
      from (select cs.*,
                   cast(trim(substr(hour, -2)) as int) as hh
            from mstr_clickstream_vw cs
           ) cs
      ) cs;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • ERROR: ORACLE prepare error: ORA-00936: missing expression. I can't seem to get your example to work. Where is the variable interval coming from in "prev_ttd + interval '1' day "? And are there too many parentheses in cast(trim(substr(hour, -2) as int) as hh – user2270911 Jan 13 '20 at 18:11