2

I have recently stumbled unto this post to calculate MAUs in BigQuery (given Firebase Analytics Data). I worked from this query to answer DAU, WAU, and MAU for my project, but I have seen some minor discrepancies for the MAU and then WAU calculations. The aim is to calculate the active user metrics that I can compare with the Firebase Dashboard here.

1) Calculating MAU

#standardSQL
SELECT
  FORMAT_TIMESTAMP(
      '%Y-%m',
      TIMESTAMP_MICROS(user_dim.first_open_timestamp_micros)) AS year_and_month, 
  COUNT(DISTINCT user_dim.app_info.app_instance_id) AS monthly_visitors
FROM `<<project-id>>.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170101' AND '20170731')
------ Inclusive for both the start-date and end-date
AND user_dim.first_open_timestamp_micros BETWEEN 1483228800000000 AND 1501545599000000
GROUP BY year_and_month 
ORDER BY year_and_month DESC;

# 1483228800000000   Sunday, January 1, 2017 12:00:00 AM
# 1501545599000000   Monday, July 31, 2017 11:59:59 PM
# https://www.epochconverter.com/

Questions:

  • The output is at least 30% lower than what I saw in the Firebase Console.
  • Is the "user_dim.first_open_timestamp_micros" only writing the column_name or helping to determine the output value? What if the user had a first open in May, did not come back in June, and then - had a session in July. According to the above code, would that user not then be counted only in May (and not July) ?
  • I guess a bigger "First Open Timestamp"-range in the WHERE-clause would be better, because it then takes into consideration more installers. How would I add in that wider range without getting a weird output like below:

    enter image description here

  • I thought the table suffix would have set the range correctly? Without the "AND user_dim.first_open_timestamp_micros BETWEEN 1483228800000000 AND 1501545599000000"-clause, one gets the wrong output like above.

2) Calculating WAU

#standardSQL
SELECT
  FORMAT_TIMESTAMP(
      '%Y-%W',
      TIMESTAMP_MICROS(user_dim.first_open_timestamp_micros)) AS year_and_week, 
  COUNT(DISTINCT user_dim.app_info.app_instance_id) AS weekly_visitors 
FROM `<<project-id>>.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170306' AND '20170917')
------ Inclusive for both the start-date and end-date
AND user_dim.first_open_timestamp_micros BETWEEN 1488758400000000 AND 1505692799000000
GROUP BY year_and_week 
ORDER BY year_and_week DESC;

# 1488758400000000    Monday, March 6, 2017 12:00:00 AM
# 1505692799000000    Sunday, September 17, 2017 11:59:59 PM

Questions

  • The above query gives me at least 50% lower than what I see in the Firebase Console. I know Distinct values will be about 10% difference due to approximation, but 50% difference is too big.
  • I also used '%Y-%W' to get my values, and not '%Y-%w' (like '%Y-%m' in the MAU calculation), would this play a role ?

3) Calculating DAU

I only managed to calculate the distinct IDs that installed on that day. Which is essentially, first_open in the firebase documentation.

    #standardSQL
    SELECT
      FORMAT_TIMESTAMP(
          '%D',
          TIMESTAMP_MICROS(user_dim.first_open_timestamp_micros)) AS year_and_day, 
      COUNT(DISTINCT user_dim.app_info.app_instance_id) AS day_visitors 
    FROM `<<project-id>>.app_events_*`
    WHERE (_TABLE_SUFFIX = '20170917')
    ------ Inclusive for both the start-date and end-date
    AND user_dim.first_open_timestamp_micros BETWEEN 1505606400000000 AND 1505692799000000 
    GROUP BY year_and_day
    ORDER BY year_and_day;

# 1505606400000000  Sunday, September 17, 2017 12:00:00 AM
# 1505692799000000  Sunday, September 17, 2017 11:59:59 PM

Questions:

  • How can I calculate the DAUs, taking into consideration all users that installed before this day and had a session again on the 17th Sept.
d_-
  • 1,391
  • 2
  • 19
  • 37
  • 1
    Small style nit: use a macro to abstract the TIMESTAMP casting, and you won't need to use ints in the WHERE clause (e.g., `$FIRST_OPEN_TIME BETWEEN TIMESTAMP '2017-09-17' AND TIMESTAMP '2017-09-18'`) – Bruno Kim Sep 20 '17 at 10:10

0 Answers0