1

My Python application needs to export BigQuery tables into small CSV files in GCS (like smaller than 1GB).

I referred to the document, and wrote the following code:

from google.cloud import bigquery

bigquery.Client().extract_table('my_project.my_dataset.my_5GB_table',
                                destination_uris='gs://my-bucket/*.csv')

The size of my_5GB_table is approximately 5GB. But it results in a single 10GB CSV file in GCS.GCS image

I tried with other tables with various numbers of size, and then some resulted in divided files of about 200MB, and others in a single huge file too.

The doc says as if tables are always divided into 1GB files, but now I don't know the rules where the files are divided.

Q1 How to make sure that tables are always divided into smaller than 1GB files ?

Q2 Can't I specify the size of files into which tables are divided ?

Taichi
  • 2,297
  • 6
  • 25
  • 47
  • For Q2: there's a [BigQuery feature request](https://issuetracker.google.com/issues/123603261) may be related. It is asking for extracting part of table. You may watch on the request to get announcement when it is supported. – Yun Zhang Mar 26 '19 at 08:14
  • @YunZhang Thank you, this would be definitely what I need. – Taichi Mar 26 '19 at 08:16
  • So the answers of Q1 and Q2 are "I cannot." ? :( – Taichi Mar 26 '19 at 08:24
  • 1
    That's really odd. BigQuery shards any tables > 1GB to multiple files. I've never seen it export a single file that size before. Maybe a Googler can chime in here to explain that one. My best guess is that they are making some changes for Next in a few weeks behind the scenes :) You can *not* specify the size of the files. It will vary with each export. – Graham Polley Mar 26 '19 at 13:43
  • My own experience is that if I try to export a table greater than about 1GB without a wildcard specified in the filename, it fails with an error indicating that it needs to shard the results... you are supplying a wildcard and the result size ought to be sharded on that basis. – Nij Jul 22 '20 at 12:15

0 Answers0