-1

I am trying to create a retention table like the following using SQL in Big Query but with MONTHLY cohorts;

Text

I have the following columns to use in my dataset, I am only using one table and it's name is 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'

order_date order_id customer_id
2020-01-02 12345 6789

I do not need the new user column and the data goes through June 2020 I think ideally a cohort month column that lists January-June cohorts and then 5 periods across.

I have tried so many different things and keep getting errors in BigQuery I think I am approaching it all wrong. The online queries I am trying to pull from seem to use dates rather than months which is also causing some confusion as I think I need to truncate my date column to months only in the query?

Does anyone have a go-to query that will work in BigQuery for a retention table or can help me approach this? Thanks!

LA 212
  • 35
  • 6

1 Answers1

1

This may help you:

With cohorts AS (
  SELECT
    customer_id,
    MIN(DATE(order_date)) AS cohort_date
  FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
  GROUP BY 1)

SELECT
  FORMAT_DATE("%Y-%m", c.cohort_date) AS cohort_mth,
  t.customer_id AS cust_id,
  DATE_DIFF(t.order_date, c.cohort_date, month) AS order_period,
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders' t
JOIN cohorts c ON t.customer_id = c.customer_id 
WHERE cohort_date >= ('2020-01-01') 
      AND DATE_DIFF(t.order_date, c.cohort_date, month) <=5
GROUP BY 1, 2, 3

I typically do pivots and % calcs in excel/ sheets. So this will give just you the input data you need for that.

NOTE:

  • This will give you a count of unique customers who ordered in period X (ignores repeat orders in period).
  • This also has period 0 (ordered again in cohort_mth) which you may wish to keep/ exclude.