0

I need to move 20 million records (approx. 10GB) from an unpartitioned big query table to a new partitioned table.

The approach is to export the original table to a GCS bucket in JSON format, using wildcard uris. I get 304 json files (approx 21GB) of different sizes each, just as the documentation says. Then I transfer that data to a new partitioned table using a big query data transfer job that ends successfully. I have also tried doing it with a call to load_table_from_uri in python.

The problem is that the destination table only gets 3.3 million records instead of 20 million. I have been looking into possible limits to no avail, considering:

  1. export limits
  2. bucket size limits
  3. data transfer limits
  4. load_table_from_uri limits

Anyone has a similar experience? Is there a limitation I am not seeing? something to be aware of in the procedure?

Thanks,

Sergio Mujica

Sergio Mujica
  • 141
  • 1
  • 5

1 Answers1

0

Is there a reason you are not doing this with DDL directly?

CREATE TABLE `datasetId.newTableId`
PARTITION BY dateField
CLUSTER BY field2, field3
AS
SELECT * FROM datasetId.oldTableId
Luka
  • 392
  • 1
  • 9
  • Thanks, that seems to work just right. The thing is that I was left with the impression (from the docs) that that DDL would pup all the records in a single partition, regardless of the dateField. I am testing now and it seems work OK... Thanks!!!! – Sergio Mujica Feb 26 '23 at 04:17
  • Anyway I would really like to know why the other methods yielded only 3.3 million records. – Sergio Mujica Feb 26 '23 at 05:09
  • How are you counting the number of records before and after? – Luka Feb 26 '23 at 15:28
  • two ways, details of the table in the console and also count(*) – Sergio Mujica Feb 26 '23 at 16:40