4

I gather analytics with Firebase Analytics which I linked to Google BigQuery.

I have the following data in BigQuery (unnecessary columns/rows are left off, the dataset looks similar to https://bigquery.cloud.google.com/table/firebase-analytics-sample-data:ios_dataset.app_events_20160607?tab=preview):

| event_dim.name | event_dim.params.key | event_dim.params.value.string_value |
|----------------|----------------------|-------------------------------------|
| read_post      | post_id              | p_100                               |
|                | group_id             | g_1                                 |
|                | user_id              | u_1                                 |
| open_group     | post_id              | p_200                               |
|                | group_id             | g_2                                 |
|                | user_id              | u_1                                 |
| open_group     | post_id              | p_300                               |
|                | group_id             | g_1                                 |
|                | user_id              | u_3                                 |

I want to query the following data:

  • event name
  • user id
  • group id

I tried the following query:

SELECT
  event_dim.name,
  FIRST(IF(event_dim.params.key = "user_id", event_dim.params.value.string_value, NULL)) WITHIN RECORD user_id,
  FIRST(IF(event_dim.params.key = "group_id", event_dim.params.value.string_value, NULL)) WITHIN RECORD group_id
FROM
  [xxx:xxx_IOS.app_events_20161102]
LIMIT
  1000

The problem with the above query is that the aggregate function FIRST will give the wrong result because the SELECT statements with a WITHIN modifier will return a list of results. The FIRST function will only give the correct result in case of the first row.

Timon
  • 362
  • 4
  • 10

1 Answers1

9

Using standard SQL (uncheck "Use Legacy SQL" under "Show Options") you can do:

SELECT
  event_dim.name,
  (SELECT value.string_value FROM UNNEST(params)
   WHERE key = 'user_id') AS user_id,
  (SELECT value.string_value FROM UNNEST(params)
   WHERE key = 'group_id') AS group_id
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
  UNNEST(event_dim) AS event_dim
LIMIT 1000;

If you only want rows that have both 'user_id' and 'group_id', you can filter out the NULL values:

SELECT * FROM (
  SELECT
    event_dim.name,
    (SELECT value.string_value FROM UNNEST(params)
     WHERE key = 'user_id') AS user_id,
    (SELECT value.string_value FROM UNNEST(params)
     WHERE key = 'group_id') AS group_id
  FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
    UNNEST(event_dim) AS event_dim
)
WHERE user_id IS NOT NULL AND group_id IS NOT NULL
LIMIT 1000;
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • Okay, that seems to work perfectly! The data is shown in a different order than the preview, but that's not a big deal. Would you know by any chance how I could add multiple tables (with the same schema) to this query? – Timon Nov 03 '16 at 17:21
  • 1
    Sure, you may be interested in [wildcard tables](https://cloud.google.com/bigquery/docs/wildcard-tables) or, instead of `FROM MyTable`, you can do `FROM (SELECT * FROM MyTable UNION ALL SELECT * FROM MyOtherTable UNION ALL ...)`. The [migration guide](https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#comma_operator_with_tables) has a couple of examples with `UNION ALL`, too. – Elliott Brossard Nov 03 '16 at 17:31
  • Hah, I was just reading about wildcard tables. That's a great feature! Thanks a lot! – Timon Nov 03 '16 at 17:32
  • Thank you, I was tearing my hair out trying to get my columns out this way. Really helped! – Almund Mar 24 '21 at 04:16