38

Anyone know of any plans to add support for delete parts of data from a table in Google Bigquery? The issue we have right now is we are using it for analytics of data points we collect over time. We want to run the queries over the last X days of data, however after the last X days of data we no longer need to store the data in BigQuery.

The only way currently we can think of to delete the data would be to delete the entire table of data, then recreate it and load it with X number of days of data. This would though require us to store our data in daily CSV files too, which isn't optimal.

Any recommendations on how to handle this issue or if there is a delete rows query coming in the near future?

nsandersen
  • 896
  • 2
  • 16
  • 41
Daum
  • 815
  • 1
  • 7
  • 11

7 Answers7

39

2016 update: BigQuery can delete and update rows now -- Fh

https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax


Thanks for describing your use case. BigQuery is append-only by design. We currently don't support deleting single rows or a batch of rows from an existing dataset.

Currently, to implement a "rotating" log system you must either: 1. Create a new table each day (and delete older tables if that is necessary) 2. Append your data to a table and query by time/date

I would actually recommend creating a new table for each day. Since BigQuery charges by amount of data queried over, this would be most economical for you, rather than having to query over entire massive datasets every time.

By the way - how are you currently collecting your data?

Michael Manoochehri
  • 7,931
  • 6
  • 33
  • 47
  • Good suggestion. We currently are storing it in MySQL, dump the data we actually process to a CSV and upload it. I'm looking around for any limits on the number of tables in a specific dataset, but can't find any. Is this correct on there being no limits? – Daum May 15 '12 at 16:47
  • 2
    Just an FYI if you are taking the rotating tables approach -- BigQuery now supports table expiration time. You can update the table with the bq tool using `bq update --expiration dataset.table`. – Jordan Tigani May 15 '12 at 16:48
  • @Daum BigQuery doesn't have a limit on the number of tables you can create per dataset. – Michael Manoochehri May 15 '12 at 19:35
  • 1
    Is deletion of data based on some user specified filters still not possible? I'm just wondering if I'm streaming data into bigquery and if I get some duff data is it possible to clear them? Are there any patterns to handle duff data? – opensourcegeek Jan 27 '15 at 16:18
  • Created a related question, we might have found a way on how to do it: http://stackoverflow.com/questions/34838622/delete-update-table-entries-by-joining-2-tables-on-google-bigquery-without-impor/ – Manuel Arwed Schmidt Jan 17 '16 at 13:19
  • 1
    Finally I can delete and update! I've been waiting for this feature for years – Charles Chow Mar 13 '17 at 19:05
  • There are some constraints to delete data fro table.If any streaming write operation is going on the table, then you can't delete data. – Priyabrata Jun 27 '17 at 04:33
34

For deleting records in Big query, you have to first enable standard sql.

Steps for enabling Standard sql

  1. Open the BigQuery web UI.
  2. Click Compose Query.
  3. Click Show Options.
  4. Uncheck the Use Legacy SQL checkbox.

This will enable the the BigQuery Data Manipulation Language (DML) to update, insert, and delete data from the BigQuery tables

Now, you can write the plain SQL query to delete the record(s)

DELETE [FROM] target_name [alias] WHERE condition

You can refer: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#delete_statement

Daniel Compton
  • 13,878
  • 4
  • 40
  • 60
rahulb
  • 970
  • 4
  • 12
  • 24
14

#standardSQL If you want to delete all the rows then use below code

delete from `project-id.data_set.table_name` where 1=1;

If you want to delete particular row then use below code.

delete from `project-id.data_set.table_name` where (your condition)
sethu
  • 241
  • 1
  • 5
  • 13
5

If you want to delete all rows in a table then :

DELETE FROM {dataset}.{table} WHERE TRUE

  • 3
    I get a error message telling UPDATE or DELETE statement over table tenor.trending_terms would affect rows in the streaming buffer, which is not supported – Tobi Aug 01 '20 at 07:49
  • Incase anyone else is wondering, it takes BigQuery time to injest/distribute the streaming buffer. If you are getting the `would affect rows in the streaming buffer, which is not supported` error, simply wait a few minutes. [This answer](https://stackoverflow.com/a/42987306/6100384) seems to indicate it can take up to `90 minutes`. I found in my case `~5 mins` was enough to clear it out. – BU0 Jul 10 '23 at 19:09
5

What worked for me:

TRUNCATE TABLE `project_id.dataset.table_name`
igorkf
  • 3,159
  • 2
  • 22
  • 31
4

Also, if applicable, you can try BigQuery's OMIT RECORD IF, to return all items except what you want to delete. Then, create a new table from that query result.

(example taken from Google reference docs)

SELECT * FROM
  publicdata:samples.github_nested

OMIT RECORD IF
  COUNT(payload.pages.page_name) <= 80;

Source: https://cloud.google.com/bigquery/query-reference

3

This is only relevant if using Legacy SQL.

You could try the following:

DELETE FROM {dataset}.{table} WHERE {constraint}
Rich
  • 6,470
  • 15
  • 32
  • 53
brettster
  • 31
  • 1