3

Been cracking my skull over this, it's quite interesting though. I have a table in which each row represents an app launch event, logged from several different applications. I am using a Redshift database and therefore Redshift's implementation of Postgresql.

Each row in this table has these columns:

  • An application ID (app_id)
  • A client_id which identifies a user
  • A date (creatively named date) which is in SQL date format

I need to calculate this ratio:

  • (distinct weekly active users) / (distinct active users from the current week AND it's previous three weeks)

This means, I am not looking for monthly active users as in Calendar months (Jan,Feb,Mar etc.)

I need to calculate the distinct active users in that ISO Week divided by the active users in that week and it's previous three weeks (thus the MAU is basically past four four-week active users).

Now, I have the query to get weekly active users as below. Week is represented as 'Week Number - Year'.

select app_id, TO_CHAR(date,'IW-IYYY') Week, count (distinct client_id) ClientsInWeek
from weplay_singtel.gamelaunch t1
where app_id <> 1751
group by app_id, Week
order by app_id, TO_DATE(TO_CHAR(date,'IW-IYYY'),'IW-IYYY');

Which gives data as shown in the attached screenshot below: Weekly Active Users Screenshot As you can see, against each ISO week and year, I have the number of distinct clients.

To get the active users in that and the previous three weeks together, I initially used a Lag and summed the 3 previous weekly counts, for each week from the above data. However, this sum does not actually give me distinct users, it is simply a sum of distinct clients in each week. If the same user opens an app in week 1 and then again in week 2, he will be counted twice in this method, which is not acceptable!

I need the distinct clients in a moving window of four weeks. I suspect I need to use partitions that somehow move over a week, but I cannot seem to write or visualize this query.

Post this, I need to similarly also create the ratio: distinct Daily Active Users / distinct Active Users in the last 30 days from that day.

Any guidance would be really helpful. Can this be done using a moving partition? Or using a Cross Apply (in Postgresql Lateral)?

Hope this makes sense! If anything is unclear please ask and I will get back with more information almost immediately. Thanks in advance folks.

Nilay Panchal
  • 541
  • 6
  • 17

1 Answers1

0

This computes the distinct clients in separate queries and join them:

with t as (
    select *
    from weplay_singtel.gamelaunch
    where app_id <> 1751
)
with w as (
    select
        app_id,
        date_trunc('week', "date") as Week,
        count (distinct client_id) as ClientsInWeek
    from t
    group by app_id, 2
), w4 as (
    select
        app_id,
        daterange(
            date_trunc('week', "date" - interval '4 weeks'),
            date_trunc('week', "date"),
            '[]'
        ) as w4,
        count (distinct client_id) as ClientsInW4
    from t
    group by app_id, 2
)
select
    w.app_id, week, clientsinweek, clientsinw4,
    clientsinweek::float / clientsinw4
from
    w
    inner join
    w4 on
        week = upper(w4)
        and
        w.app_id = w4.app_id

The 4 weeks period is grouped as a date range. Now post data and I can test it.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • Thanks a lot for this Clodaldo! Will this work in Redshift SQL though? Doesn't seem to run for me, i'm connecting to Redshift via SQLWorkbench/J. – Nilay Panchal Jul 02 '14 at 02:00
  • @nilay I have no experience with Redshift. What is the error? I guess it is the `daterange` datatype. If you can confirm that I can try to find an alternative path. – Clodoaldo Neto Jul 02 '14 at 08:48