1

I would like to know if it is possible to limit the bigquery query size when running a query through the web user-interface?

My idea is just to test the query but instead of querying all my tables; I would like just to query a part of it with for instance a number of row.

Limit is not optimizing my query cost, so the idea is to find a function similar to "row_number" or "fetch".

Sorry I'm a marketer and not a developer, so thank you in advance for your kind help.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
sPujade
  • 93
  • 12
  • Please post the query and tell actual size/cost. Without a query only the preview mode is free of charge. – Pentium10 Feb 12 '17 at 08:29
  • SELECT SUM(t0.totals.visits) AS t0._totals.visits_, t0.geoNetwork.country, t0.userId FROM (SELECT * FROM TABLE_DATE_RANGE(XXX:85801771.ga_sessions_], TIMESTAMP('20161211'), TIMESTAMP('20170315'))) AS t0 GROUP EACH BY t0.geoNetwork.country, t0.userId ORDER BY t0._totals.visits_ DESC; – sPujade Feb 12 '17 at 09:12
  • Query complete (5.7s elapsed, 314 MB processed) the cost is depending on how data is processed but also on how much time per day, the query is launched. Thank Pentium10 for your help – sPujade Feb 12 '17 at 09:12
  • There is not much to optimize here, it's already uses 314MB which is extremely low. Please note that on BigQuery every month 1TB query analyses is free, then costs $5 for TB. I think we should speak about optimizing queries after it costs too much. – Pentium10 Feb 12 '17 at 09:48

1 Answers1

4

How to limit BigQuery query size for testing ... ?

1 - Try to minimize number of tables involved in your testing In your query – there are 60+ tables involved for respectively dates between 2016-12-11 and nowadays

SELECT <fields_list> FROM
TABLE_DATE_RANGE([XXX:85801771.ga_sessions_],
  TIMESTAMP('20161211'),
  TIMESTAMP('20170315'))

Instead you can use same day as start and end of time range, thus drastically reducing number of involved tables (down to just one table) and overall scan size. For example

SELECT <fields_list> FROM
TABLE_DATE_RANGE([XXX:85801771.ga_sessions_],
  TIMESTAMP('20161211'),
  TIMESTAMP('20161211'))

2 - Minimize number of rows. Ability to do so really depends on how your table is being loaded with data. If table loaded incrementally - you can use so called table decorators.
Note - this technique works with tables within last 7 days

For example, below will scan only data that was in table at one hour ago (so called snapshot decorator)

SELECT <fields_list> FROM [XXX:85801771.ga_sessions_20170212@-3600000]  

This works well with the most recent day's table especially at the start of the day when size of table is not big yet

So, to limit further, you can use below version (so called range decorator) - gives you data added between one hour and half an hour ago

SELECT <fields_list> FROM [XXX:85801771.ga_sessions_20170212@-3600000--1800000]  

Finally, @0 is a special case that references the oldest possible snapshot of the table: either 7 days in the past, or the table's creation time if the table is less than 7 days old. For example

SELECT <fields_list> FROM [XXX:85801771.ga_sessions_20170210@0]  

3 - Test against Sampled Table. If you expect experimenting with your query again and again - you can first prepare downsized version of your table with just as many rows as you need and applying sampling logic that fit in your business logic. To limit number of rows you can use LIMIT Clause. To get random rows you can use RAND function for example
After sampled table is prepared - run all your query against it till when you have final version - after this - you can run it against your original table(s)
And btw, to create sampled table you need to set destination table under options in Web UI.

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230