0

For example, I have a table in bigquery with 10 million rows, I want to extract this table to Google Storage every 100 thousand rows. To make it clear, I want 100 csv files and each one have 100k distinct rows in the bigquery table.

bq extract --noprint_header dataeset.abigtable gs://bucket/output/*.csv

With the code above entered into gcloud shell, the table will be splitted into 10 or so files in google storage. However, I have no control of how many rows in each of the file. How could I control it?

Jaroslav
  • 724
  • 4
  • 17
user11366694
  • 141
  • 1
  • 2
  • 8
  • You can't control it because BigQuery export doesn't support it. _"When you export your data to multiple files, the size of the files will vary."_ https://cloud.google.com/bigquery/docs/exporting-data – Graham Polley Apr 30 '19 at 05:00
  • Well, if the number of splits is fixed. I suppose at least I could do partitioning tables within big query like ```limit 1000k offset 2000k```, and then extract data. It would be a lot easier if I can write script and do a loop job. However I have no idea where should I run this loop and how. Is there any code that works in Google Shell and will fulfill my task? – user11366694 Apr 30 '19 at 08:08
  • What do you mean by _"the number of splits"_? You also cannot control how many files/shards will be created. – Graham Polley Apr 30 '19 at 11:35

1 Answers1

1

There is no flag which you can use to make your use-case possible. If you think this feature would be helpful, you can file this as a feature request, although a handful of stars is needed to get this feature request to be prioritized by the BigQuery team. To make it more meaningful, there should be a business use-case as to why you need such (e.g why would you need the exported csv file to have 100k rows each?).

Christopher
  • 895
  • 6
  • 15