I have recently stumbled unto this post to calculate MAUs in BigQuery (given Firebase Analytics Data). I worked from this query to answer DAU, WAU, and MAU for my project, but I have seen some minor discrepancies for the MAU and then WAU calculations. The aim is to calculate the active user metrics that I can compare with the Firebase Dashboard here.
1) Calculating MAU
#standardSQL
SELECT
FORMAT_TIMESTAMP(
'%Y-%m',
TIMESTAMP_MICROS(user_dim.first_open_timestamp_micros)) AS year_and_month,
COUNT(DISTINCT user_dim.app_info.app_instance_id) AS monthly_visitors
FROM `<<project-id>>.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170101' AND '20170731')
------ Inclusive for both the start-date and end-date
AND user_dim.first_open_timestamp_micros BETWEEN 1483228800000000 AND 1501545599000000
GROUP BY year_and_month
ORDER BY year_and_month DESC;
# 1483228800000000 Sunday, January 1, 2017 12:00:00 AM
# 1501545599000000 Monday, July 31, 2017 11:59:59 PM
# https://www.epochconverter.com/
Questions:
- The output is at least 30% lower than what I saw in the Firebase Console.
- Is the "user_dim.first_open_timestamp_micros" only writing the column_name or helping to determine the output value? What if the user had a first open in May, did not come back in June, and then - had a session in July. According to the above code, would that user not then be counted only in May (and not July) ?
I guess a bigger "First Open Timestamp"-range in the WHERE-clause would be better, because it then takes into consideration more installers. How would I add in that wider range without getting a weird output like below:
I thought the table suffix would have set the range correctly? Without the "AND user_dim.first_open_timestamp_micros BETWEEN 1483228800000000 AND 1501545599000000"-clause, one gets the wrong output like above.
2) Calculating WAU
#standardSQL
SELECT
FORMAT_TIMESTAMP(
'%Y-%W',
TIMESTAMP_MICROS(user_dim.first_open_timestamp_micros)) AS year_and_week,
COUNT(DISTINCT user_dim.app_info.app_instance_id) AS weekly_visitors
FROM `<<project-id>>.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170306' AND '20170917')
------ Inclusive for both the start-date and end-date
AND user_dim.first_open_timestamp_micros BETWEEN 1488758400000000 AND 1505692799000000
GROUP BY year_and_week
ORDER BY year_and_week DESC;
# 1488758400000000 Monday, March 6, 2017 12:00:00 AM
# 1505692799000000 Sunday, September 17, 2017 11:59:59 PM
Questions
- The above query gives me at least 50% lower than what I see in the Firebase Console. I know Distinct values will be about 10% difference due to approximation, but 50% difference is too big.
- I also used '%Y-%W' to get my values, and not '%Y-%w' (like '%Y-%m' in the MAU calculation), would this play a role ?
3) Calculating DAU
I only managed to calculate the distinct IDs that installed on that day. Which is essentially, first_open in the firebase documentation.
#standardSQL
SELECT
FORMAT_TIMESTAMP(
'%D',
TIMESTAMP_MICROS(user_dim.first_open_timestamp_micros)) AS year_and_day,
COUNT(DISTINCT user_dim.app_info.app_instance_id) AS day_visitors
FROM `<<project-id>>.app_events_*`
WHERE (_TABLE_SUFFIX = '20170917')
------ Inclusive for both the start-date and end-date
AND user_dim.first_open_timestamp_micros BETWEEN 1505606400000000 AND 1505692799000000
GROUP BY year_and_day
ORDER BY year_and_day;
# 1505606400000000 Sunday, September 17, 2017 12:00:00 AM
# 1505692799000000 Sunday, September 17, 2017 11:59:59 PM
Questions:
- How can I calculate the DAUs, taking into consideration all users that installed before this day and had a session again on the 17th Sept.