0

When I query BQ with ROLLUP by potentially large set of grouping criteria field (in this case campaign_group_id)

for example:

SELECT
  campaign_group_id AS campaign_group_id,
  DATE(DATE_ADD(TIME, 3, 'HOUR')) AS DAY,
  SUM(impressions) AS imps
FROM
  [browser_traffic.2016_05_28],
  [browser_traffic.2016_05_29]
WHERE
  ( DATE_ADD( TIME, 3, "HOUR") >= '2016-05-28 00:00:00'
    AND DATE_ADD( TIME, 3, "HOUR") < '2016-05-30 00:00:00' )
GROUP EACH BY ROLLUP (campaign_group_id, DAY)
ORDER BY DAY ASC, campaign_group_id ASC
LIMIT 500

BQ returns many rows with subtotals which is not applicable in my use case:

+-------------------+------+-----------+
| campaign_group_id | day  |   imps    |
+-------------------+------+-----------+
|              NULL | NULL | 158423933 |
|                61 | NULL |         0 |
|               496 | NULL |     79870 |
|               497 | NULL |     10492 |
|               809 | NULL |         0 |
|               936 | NULL |      2451 |
|               937 | NULL |         0 |
|               940 | NULL |      6844 |
|               942 | NULL |    207685 |
|               946 | NULL |         0 |
|               961 | NULL |         0 |
|               975 | NULL |     16167 |
|               976 | NULL |     15767 |
|              1018 | NULL |         0 |
|              1020 | NULL |         0 |
|              1022 | NULL |    766875 |
|              1039 | NULL |    355765 |
...

I need to somehow reduce subtotal rows from result but leave intact grand total row (which is first row in above result)

Is it possible that BQ returns only grand total row for selected fields?

Andrew
  • 3
  • 3

3 Answers3

1

You can filter the result of the query with another SELECT statement:

SELECT campaign_group_id, day, imps 
FROM (
    ... your rollup query with LIMIT removed ...
)
WHERE (day IS NOT NULL) OR (campaign_group_id IS NULL)
LIMIT 500
Michael Entin
  • 7,189
  • 3
  • 21
  • 26
  • Thank you @Michael - I think it is exactly what I need! – Andrew May 31 '16 at 11:03
  • Tested today you suggestion but if I have more grouping columns the result will contain subtotal rows as well. More problems will be in place if such columns are nullable which is exactly in scope of my usecases... – Andrew May 31 '16 at 18:49
  • 1
    use GROUPING(), scroll for full answer – Felipe Hoffa Jun 01 '16 at 01:05
0

If you are interested in grand totals - you most likely do not need ROLLUP

Rather you can consider regular GROUP BY as below

SELECT
  DATE(DATE_ADD(TIME, 3, 'HOUR')) AS DAY,
  SUM(impressions) AS imps
FROM
  [browser_traffic.2016_05_28],
  [browser_traffic.2016_05_29]
WHERE
  ( DATE_ADD( TIME, 3, "HOUR") >= '2016-05-28 00:00:00'
    AND DATE_ADD( TIME, 3, "HOUR") < '2016-05-30 00:00:00' )
GROUP 1
ORDER BY DAY ASC
LIMIT 500
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • I updated my question.. I interested in grand total and data rows as well but not subtotal rows. Thank you for answering – Andrew May 31 '16 at 06:25
0

Notice that the function GROUPING() exists too, and will help you with this:

SELECT year, name, SUM(number) s,
       GROUPING(year) is_grouping_year, 
       GROUPING(name) is_grouping_name
FROM [bigquery-public-data:usa_names.usa_1910_2013] 
WHERE name IN ('John', 'Jovana')
AND year BETWEEN 2012 AND 2013
GROUP BY ROLLUP(name, year)
ORDER BY year, name

year    name    s       is_grouping_year    is_grouping_name     
null    null    21182   1                   1    
null    John    21164   1                   0    
null    Jovana  18      1                   0    
2012    John    10576   0                   0    
2012    Jovana  18      0                   0    
2013    John    10588   0                   0

Basically you are asking for the rows where is_grouping_year and is_grouping_name are either both 0 or 1.

From the docs:

When using the ROLLUP function, you can use the GROUPING function to distinguish between rows that were added because of the ROLLUP function and rows that actually have a NULL value for the group key.

https://cloud.google.com/bigquery/query-reference#groupby

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • Yes, been there... In my query date is computed field eg DATE(DATE_ADD(TIME, 3, 'HOUR')) so is unavailible for GROUPING. But I try to wrap it up in subselect. – Andrew Jun 01 '16 at 04:00