1

I am trying to write a query to find month over month percent change in user registration. \

Users table has the logs for user registrations

user_id - pk, integer

created_at - account created date, varchar

activated_at - account activated date, varchar

state - active or pending, varchar

I found the number of users for each year and month. How do I find month over month percent change in user registration? I think I need a window function?

SELECT
   EXTRACT(month from created_at::timestamp) as created_month
  ,EXTRACT(year from created_at::timestamp) as created_year
  ,count(distinct user_id) as number_of_registration
FROM users 
GROUP BY 1,2
ORDER BY 1,2

This is the output of above query:

enter image description here

Then I wrote this to find the difference in user registration in the previous year.

SELECT 
  *
  ,number_of_registration - lag(number_of_registration) over (partition by created_month) as difference_in_previous_year
FROM (
  SELECT
     EXTRACT(month from created_at::timestamp) as created_month
    ,EXTRACT(year from created_at::timestamp) as created_year
    ,count( user_id) as number_of_registration
  FROM users as u 
  GROUP BY 1,2
  ORDER BY 1,2) as temp 

The output is this: enter image description here

3 Answers3

0

You want an order by clause that contains created_year.

number_of_registration 
    - lag(number_of_registration) over (partition by created_month order by created_year) as difference_in_previous_year

Note that you don't actually need a subquery for this. You can do:

select
    extract(year from created_at) as created_year,
    extract(month from created_at) as created_year
    count(*) as number_of_registration,
    count(*) - lag(count(*)) over(partition by extract(month from created_at) order by extract(year from created_at))
from users as u 
group by created_year, created_month
order by created_year, created_month

I used count(*) instead of count(user_id), because I assume that user_id is not nullable (in which case count(*) is equivalent, and more efficient). Casting to a timestamp is also probably superfluous.

These queries work as long as you have data for every month. If you have gaps, then the problem should be addressed differently - but this is not the question you asked here.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • thank you. But now, I need to calculate month over month percent change. How do I do that? –  Oct 04 '20 at 20:02
  • @Aaron: your query computes the difference, not the percent. Maybe you want: `100.0 * (number_of_registration - lag(number_of_registration) over (partition by created_month order by created_year)) / lag(number_of_registration) over (partition by created_month order by created_year)` ? – GMB Oct 04 '20 at 20:05
0

I can get the registrations from each year as two tables and join them. But it is not that effective

SELECT
   t1.created_year as year_2013
  ,t2.created_year as year_2014
  ,t1.created_month as month_of_year
  ,t1.number_of_registration_2013 
  ,t2.number_of_registration_2014 
  ,(t2.number_of_registration_2014 - t1.number_of_registration_2013) / t1.number_of_registration_2013 * 100 as percent_change_in_previous_year_month
FROM  
    (select 
         extract(year from created_at) as created_year
        ,extract(month from created_at) as created_month
        ,count(*) as number_of_registration_2013
    from users
    where extract(year from created_at) = '2013'
    group by 1,2)  t1
inner join 
    (select 
         extract(year from created_at) as created_year
        ,extract(month from created_at) as created_month
        ,count(*) as number_of_registration_2014
    from users
    where extract(year from created_at) = '2014'
    group by 1,2)  t2
on t1.created_month = t2.created_month

0

First off, Why are you using strings to hold date/time values? Your 1st step should to define created_at, activated_at as a proper timestamps. In the resulting query I assume this correction. If this is faulty (you do not correct it) then cast the string to timestamp in the CTE generating the date range. But keep in mind that if you leave it as text you will at some point get a conversion exception.

To calculate month-over-month use the formula "100*(Nt - Nl)/Nl" where Nt is the number of users this month and Nl is the number of users last month. There are 2 potential issues:

  1. There are gaps in the data.
  2. Nl is 0 (would incur divide by 0 exception)

The following handles this by first generating the months between the earliest date to the latest date then outer joining monthly counts to the generated dates. When Nl = 0 the query returns NULL indication the percent change could not be calculated.

with full_range(the_month) as   
     (select generate_series(low_month, high_month, interval '1 month')
        from (select min(date_trunc('month',created_at)) low_month
                   , max(date_trunc('month',created_at)) high_month
                from users
             ) m
     )  
select to_char(the_month,'yyyy-mm')
     , users_this_month
     , case when users_last_month = 0 
            then null::float
            else round((100.00*(users_this_month-users_last_month)/users_last_month),2) 
        end percent_change
  from ( 
         select the_month, users_this_month , lag(users_this_month) over(order by the_month)  users_last_month
           from ( select f.the_month, count(u.created_at) users_this_month  
                    from full_range f
                    left join users u on date_trunc('month',u.created_at) = f.the_month
                   group by f.the_month 
                 ) mc
       ) pc 
 order by the_month;

NOTE: There are several places there the above can be shortened. But the longer form is intentional to show how the final vales are derived.

Belayer
  • 13,578
  • 2
  • 11
  • 22