2

Each row in my table has a field that is an array, and I'd like to get a field from the first array entry.

For example, if my row is

[
  {
    "user_dim": {
      "user_id": "123",
      "user_properties": [
        {
          "key": "content_group",
          "value": {
            "value": {
              "string_value": "my_group"
            }
          }
        }
      ]
    },
    "event_dim": [
      {
        "name": "main_menu_item_selected",
        "timestamp_micros": "1517584420597000"
      },
      {
        "name": "screen_view",
        "timestamp_micros": "1517584420679001"
      }
    ]
  }
]

I'd like to get

user_id: 123, content_group: my_group, timestamp_1517584420597000

Gabi
  • 413
  • 2
  • 8
  • 1
    Is there a reason to want to use legacy SQL for this? It's possible, but standard SQL has much better array support. – Elliott Brossard Feb 07 '18 at 14:44
  • @ElliottBrossard Thanks for your comment! :) I'm combining the query with TABLE_DATE_RANGE([app_events_], TIMESTAMP('2018-01-29'), TIMESTAMP('2018-02-02')) and I haven't found a standard SQL equivalent. Is there one? – Gabi Feb 07 '18 at 14:47
  • 1
    Yes, please look at the [documentation for wildcard tables](https://cloud.google.com/bigquery/docs/querying-wildcard-tables). You would be selecting from `app_events_*` and filtering on e.g. `_TABLE_SUFFIX BETWEEN '20180129' AND '20180202'` (assuming that is the right suffix). – Elliott Brossard Feb 07 '18 at 15:32

1 Answers1

2

As Elliott mentioned - BigQuery Standard SQL has way much better support for ARRAYs than legacy SQL. And in general, BigQuery team recommend using Standard SQL

So, below is for BigQuery Standard SQL (including handling wildcard stuff)

#standardSQL
SELECT 
  user_dim.user_id AS user_id,
  (SELECT value.value.string_value 
     FROM UNNEST(user_dim.user_properties) 
     WHERE key = 'content_group' LIMIT 1
  ) content_group,
  (SELECT event.timestamp_micros 
     FROM UNNEST(event_dim) event 
     WHERE name = 'main_menu_item_selected'
  ) ts
FROM `project.dataset.app_events_*`
WHERE _TABLE_SUFFIX BETWEEN '20180129' AND '20180202'   

with result (for the dummy example from your question)

Row     user_id     content_group       ts   
1       123         my_group            1517584420597000     
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • 2
    Downvoted because question specifically asks about Legacy SQL, while this answer talks about Standard SQL. This is not an answer, only a suggestion, hence should be a comment. – Crembo Sep 04 '18 at 10:53
  • Thank you @Crembo for leaving comments, so this downvote is not as confusing as it would be without reasoning behind it. really appreciate it! Wanted to follow up on this - as you can see from comments on question itself - OP did not know initially that there is a way to make it with standard sql and actually asked if there is an equivalent of what he asked in standard sql - and that is what I answered :o) Yet another reason for us usually providing answer for standard sql vs. legacy sql is that it is very much recommended by BigQuery team as they recommend users to migrate to Standard one :) – Mikhail Berlyant Sep 04 '18 at 18:14