Exporting big query table to GCS to transfer data to a new table

Question

I need to move 20 million records (approx. 10GB) from an unpartitioned big query table to a new partitioned table.

The approach is to export the original table to a GCS bucket in JSON format, using wildcard uris. I get 304 json files (approx 21GB) of different sizes each, just as the documentation says. Then I transfer that data to a new partitioned table using a big query data transfer job that ends successfully. I have also tried doing it with a call to load_table_from_uri in python.

The problem is that the destination table only gets 3.3 million records instead of 20 million. I have been looking into possible limits to no avail, considering:

export limits
bucket size limits
data transfer limits
load_table_from_uri limits

Anyone has a similar experience? Is there a limitation I am not seeing? something to be aware of in the procedure?

Thanks,

Sergio Mujica

score 0 · Accepted Answer · answered Feb 26 '23 at 04:03

0

Is there a reason you are not doing this with DDL directly?

CREATE TABLE `datasetId.newTableId`
PARTITION BY dateField
CLUSTER BY field2, field3
AS
SELECT * FROM datasetId.oldTableId

answered Feb 26 '23 at 04:03

Luka

392
1
9

Thanks, that seems to work just right. The thing is that I was left with the impression (from the docs) that that DDL would pup all the records in a single partition, regardless of the dateField. I am testing now and it seems work OK... Thanks!!!! – Sergio Mujica Feb 26 '23 at 04:17
Anyway I would really like to know why the other methods yielded only 3.3 million records. – Sergio Mujica Feb 26 '23 at 05:09
How are you counting the number of records before and after? – Luka Feb 26 '23 at 15:28
two ways, details of the table in the console and also count(*) – Sergio Mujica Feb 26 '23 at 16:40

Exporting big query table to GCS to transfer data to a new table

1 Answers1