2

Objective:

Get Id logins in week 1. Then how many of those Ids logged in in Week 2. Restart the same logic for Week 2 to Week 3. Then week 3 and week 4 and so on... This exercise needs to be done every week. The Ids need to be segmented by cohorts which are the month and year they subscribed.

Story:

First table (member) has the email and its creation date. The 2nd table (login table) is the login activity. First, I need to group emails by creation date(month-year) to create the cohorts. Then, the login activity comparing week to week for each cohort. Is it possible for this query to be dynamic each week?

Output: The result should look like this:

+--------+--------+--------+--------+---------+
| Cohort | 2019-1 | 2019-2 | 2019-3 | 2019-4  |...
+--------+--------+--------+--------+---------+
| 2018-3 |   7000 |   6800 |    7400|    7100 |...
| 2018-4 |   6800 |   6500 |    8400|    8000 |...
| 2018-5 |   9500 |   8000 |    6400|    6200 |...
| 2018-6 |   9100 |   8500 |    8000|    7800 |...
| 2018-7 |  10000 |   8000 |    7000|    6800 |...
+--------+--------+--------+--------+---------+

What I tried:

SELECT CONCAT(DATEPART(YEAR,m.date_created),'-',DATEPART(MONTH,m.date_created)) AS Cohort
    ,CONCAT(subquery.[YYYY],'-',subquery.[ISO])                     AS YYYY_ISO
    ,m.email
FROM member as m
INNER JOIN (SELECT DATEPART(YEAR,log.login_time)    AS [YYYY]
        ,DATEPART(ISO_WEEK,log.login_time)  AS [ISO]
        ,log.email
        ,ROW_NUMBER()
            OVER(PARTITION BY 
            DATEPART(YEAR,log.login_time), 
            DATEPART(ISO_WEEK,log.login_time),
            log.email
        ORDER BY log.login_time ASC) AS Log_Rank
        FROM login AS log
        WHERE CAST(log.login_time AS DATE) >= '2019-01-01'
        ) AS subquery ON m.email=subquery.email AND Log_Rank = 1
ORDER BY cohort 

Sample Data:

CREATE TABLE member
    ([email] varchar(50), [date_created] Datetime)

CREATE TABLE login
    ([email] varchar(50), [login_time] Datetime)

INSERT INTO member
VALUES
    ('player123@google.com', '2018-03-01 05:00:00'),
    ('player999@google.com', '2018-04-12 12:00:00'),
    ('player555@google.com', '2018-04-25 20:15:00')

INSERT INTO login
VALUES
    ('player123@google.com', '2019-01-07 05:30:00'),
    ('player123@google.com', '2019-01-08 08:30:00'),
    ('player123@google.com', '2019-01-15 06:30:00'),
    ('player999@google.com', '2019-01-08 11:30:00'),
    ('player999@google.com', '2019-01-10 07:30:00'),
    ('player555@google.com', '2019-01-08 04:30:00')
Roger Steinberg
  • 1,554
  • 2
  • 18
  • 46
  • Where do the values like `7000` coming from? I'm a little confused on that one. Are we missing (a significant) amount of data? – Thom A Feb 06 '19 at 21:26
  • randomly generated. basically if week 1 is compared to week 2. week 2 can't have more than week 1. when week 2 is compared to week 3, week 3 can't have more than week 2 and so on. @Larnu – Roger Steinberg Feb 06 '19 at 21:27
  • But `7000` is less than `8000`? Although we have some really good sample data, I have no idea what your logic is here. – Thom A Feb 06 '19 at 21:29
  • @Larnu Yes that's fine. The logic is as follows: If 1000 people (who subscribed in March 2018) logged in in week 1 and 700 of them logged in in week 2. That's a 70% retention. I need to find out how to do this for other cohorts (i.e subscribers of Apr 2018, May 2018 and so on) ; week over week. The output is serves as a visual representation of the end product. It looks like a pivot should be used but i'm unsure – Roger Steinberg Feb 06 '19 at 21:35
  • 1
    @RogerSteinberg hi, if i understand what you want i think you have to use (pivot, group by, dynamic exec query , and some agregate functions like count, ) – Sanpas Feb 06 '19 at 22:11
  • @pascalsanchez spot on. – Roger Steinberg Feb 06 '19 at 22:14

0 Answers0