0

Is there a way to view historical information about BigQuery tables over time? For example, if I wanted to check how the size of a table has changed?

Attempts:
None, because I am unaware if this is doable.

  • 2
    well you can query table at a point in time using time travel. Doc - https://cloud.google.com/bigquery/docs/time-travel#query_data_at_a_point_in_time. But in your case i guess you need to build this custom functionality using time travel itself. – Mr.Batra Sep 23 '21 at 10:34
  • Yah @Mr.Batra I was looking at time travel but it only provides snapshots over the last 7 days. I guess I'll have to write a script locally or GCP to submit SELECT COUNT(*) queries and record them somewhere... yeesh – randomdatascientist Sep 23 '21 at 17:54

2 Answers2

1

Depending specifically on what info you want to get, there are multiple ways of accomplishing this:

Time travel

As mentioned in the comments to the question, to get the status of the dataset on a given time for the last 7 days, you can use time travel.

Metrics

To get some idea about the rows uploaded or the storage used in a dataset, you could use the GCP metrics, specifically rows uploaded or stored byted.

Partition the table

Another possibility is to partition the tables by ingestion time. This will create a pseudocolumn in your table called _PARTITIONTIME, and you'll be able to run queries according to the ingestion time (down to an hour granularity, according to the boundary limits for partitioned tables).

If you partition your tables, then you can query the rows inserted every hours with a query like:

SELECT _PARTITIONTIME as insert_time,
count(1) as count,
FROM `project-id.dataset-id.table-id` 
GROUP BY _PARTITIONTIME
ORDER BY 1 DESC

If you want to know how many rows were at some point in time, you can run another query like this:

SELECT COUNT(1) as count
FROM `project-id.dataset-id.table-id`
WHERE _PARTITIONTIME < TIMESTAMP("2016-04-15 08:00:00")

More examples in the docs for partitioned tables.

Jofre
  • 3,718
  • 1
  • 23
  • 31
1

For tables you can now check on TABLE_STORAGE_TIMELINE view with information for all tables in the current or specified project. This view contains metadata about historical table storage usage.

Temu
  • 859
  • 4
  • 11