How to extract Bigquery table for every given length of rows to csv file in Google Storage?

Question

For example, I have a table in bigquery with 10 million rows, I want to extract this table to Google Storage every 100 thousand rows. To make it clear, I want 100 csv files and each one have 100k distinct rows in the bigquery table.

bq extract --noprint_header dataeset.abigtable gs://bucket/output/*.csv

With the code above entered into gcloud shell, the table will be splitted into 10 or so files in google storage. However, I have no control of how many rows in each of the file. How could I control it?

You can't control it because BigQuery export doesn't support it. _"When you export your data to multiple files, the size of the files will vary."_ https://cloud.google.com/bigquery/docs/exporting-data — Graham Polley, Apr 30 '19 at 05:00
Well, if the number of splits is fixed. I suppose at least I could do partitioning tables within big query like ```limit 1000k offset 2000k```, and then extract data. It would be a lot easier if I can write script and do a loop job. However I have no idea where should I run this loop and how. Is there any code that works in Google Shell and will fulfill my task? — user11366694, Apr 30 '19 at 08:08
What do you mean by _"the number of splits"_? You also cannot control how many files/shards will be created. — Graham Polley, Apr 30 '19 at 11:35

score 1 · Accepted Answer · answered Apr 30 '19 at 08:55

There is no flag which you can use to make your use-case possible. If you think this feature would be helpful, you can file this as a feature request, although a handful of stars is needed to get this feature request to be prioritized by the BigQuery team. To make it more meaningful, there should be a business use-case as to why you need such (e.g why would you need the exported csv file to have 100k rows each?).

How to extract Bigquery table for every given length of rows to csv file in Google Storage?

1 Answers1