1

Is streaming data into a column-partitioned BigQuery table supported? I'm having trouble, getting this error:

BadRequest: 400 POST https://www.googleapis.com/bigquery/v2/projects/...$20180410/insertAll: Streaming to metadata partition of column based partitioning table ...$20180410 is disallowed.

Reading the BigQuery streaming documentation it says streaming to partitioned tables is allowed, but all examples are for the ingest-time partitions. I don't see reference to the newer column-based partitioning.

Is it supported and I'm just doing it wrong? For example, the error occurs when I explicitly add the partition suffix ($YYYYMMDD). When I don't use the suffix the write succeeds, but it doesn't look like it's actually partitioned.

Here's my sample code:

We have a table with a few columns, let's say this:

date: DATE (partitioned field)
name: STRING
count: INTEGER

I'm trying to do a streaming insert, via:

from google.cloud import bigquery

data = [
    {'date': date('2018-04-10'), 'name': 'x', 'count': 10},
    {'date': date('2018-04-10'), 'name': 'y', 'count': 5},    
]
client = bigquery.Client(...)
table_ref = client.dataset(dataset).table(tableid + '$20180410')
schema = client.get_table(table_ref).schema

# Raises the 400 Bad Request above.
result = client.insert_rows(table_ref, data, selected_fields=schema)
Greg
  • 166
  • 1
  • 10
  • 1
    pleasse try to remove $20180410 – Lei Chen Apr 13 '18 at 17:25
  • When I remove the partition "$20180410" the write succeeds, but near as I can tell the data is not actually partitioned. If I then try to delete data from just the partition via `bq rm ...$20180410` the rows are not removed. – Greg Apr 13 '18 at 17:27
  • 2
    streaming to a particular partition of a column partitioned table is disallowed. You can stream to the table without suffix and the data will be distributed to partitions according to partitioning field value – Lei Chen Apr 13 '18 at 17:30
  • it might be still in streaming buffer while you are deleting the partition. Could you please try to delete in a few mins? – Lei Chen Apr 13 '18 at 17:33
  • could you please share your full table name with projectid and datasetid? – Lei Chen Apr 13 '18 at 18:32
  • Aha, thanks Lei. I think that's it. I streamed rows to the table at around 10:40am Pacific time, and they showed up within a few minutes but it took them at least a few hours to be partitioned. I tried at 1:10pm and was able to bq rm them. That's pretty slow! I think I'll stick with the upload from file, since it seems to partition right then. – Greg Apr 13 '18 at 20:11
  • @Greg Can you add your **Updated/Solved** section as an answer and accept it? – VictorGGl Apr 23 '18 at 08:05
  • @VictorGGl aha, done. – Greg Apr 24 '18 at 16:24

1 Answers1

3

Updated/Solved (Hat-tip and thanks to Lei Chen. See comments for discussion)

To stream into the partitioned table, do not append the $YYYMMDD on the tableid. BigQuery will do the partitioning for you, but...it appears to be a separate stage. My records appeared in the table from the stream buffer within a few minutes, but took several hours to be properly partitioned.

This is easier to tell when you use the ingest time partition and can query the _PARTITION pseudocolumn. For column partitions, I didn't find a good way to know if it had been partitioned other than bq rm-ing the partition and seeing if the records vanished.

Greg
  • 166
  • 1
  • 10
  • 1
    Anything that is still in the streaming buffer for partitioned tables will show up in the UNPARTITIONED partition until it has been extracted, at which point it will show up in the correct partition. – Victor Mota Oct 19 '18 at 06:48
  • As of now, Streaming buffer data gets automatically brought into query results when you use a partition filter in your query. The same goes for Batch uploading of files. – Haris Nadeem Dec 20 '18 at 21:06