6

Here's the case:

  • Our client daily uploads CSVs (overwritten) to a bucket in Google Cloud Storage (each table in a different file).
  • We use BigQuery as DataSource in DataStudio
  • We want to automatically transfer the CSVs to BigQuery.

The thing is, even though we've:

  1. Declared the tables in BigQuery with "Overwrite table" write preference option
  2. Configured the daily Transfers vía UI (BigQuery > Transfers) to automatically upload the CSVs from Google Cloud one hour after the files are uploaded to Google Cloud, as stated by the limitations.

The automated transfer/load is by default in "WRITE_APPEND", thus the tables are appended instead of overwritten in BigQuery.

Hence the question: How/where can we change the

configuration.load.writeDisposition = WRITE_TRUNCATE

as stated here in order to overwrite the tables when the CSVs are automatically loaded?

I think that's what we're missing.

Cheers.

gabi493
  • 146
  • 9
  • Which option are you using when writing your query results: https://cloud.google.com/bigquery/docs/writing-results – Tamir Klein Jan 02 '19 at 14:32
  • Hi @TamirKlein, we are `bigquery.admin`. – gabi493 Jan 02 '19 at 14:39
  • No this is your writer access permission. I mean how do you write your query result, are you UI, Java, other? – Tamir Klein Jan 02 '19 at 15:01
  • Sorry @TamirKlein, I misunderstood you; we're doing everything via UI, but we're not executing any queries we need to store. The only ones we execute for now are the ones used to manually check some results, thus we don't need to write them in any table. – gabi493 Jan 02 '19 at 15:02
  • I'm confused. Anyway, the writeDisposition is not defined on the table level rather it's defined when you write into the table. So based on the way you write you can set it up. Hope this help. – Tamir Klein Jan 02 '19 at 15:04
  • I'll try to clarify: For now, we're using BigQuery as a DB. The only step we need to do is to automatically overwrite the tables from BigQuery with the data from the Google bucket. When uploading the CSVs to BigQuery by hand, data is correctly overwritten, but when trying to automate the process, data is appended. Thanks for your help! – gabi493 Jan 02 '19 at 15:06
  • I see. Can you please provide code example on how you automate the writing to bigquery – Tamir Klein Jan 02 '19 at 15:14
  • We do it via UI: In BigQuery > Transfers. We have a daily scheduled transfer for each table. – gabi493 Jan 02 '19 at 15:26
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186058/discussion-between-gabi493-and-tamir-klein). – gabi493 Jan 02 '19 at 16:33

2 Answers2

5

None of the above worked for us, so I'm posting this in case anyone has the same issue.

We scheduled a query to erase the table content just before the automatic importation process starts:

DELETE FROM project.tableName WHERE true

And then, new data will be imported to a void table, therefore default "WRITE_APPEND" doesn't affect us.

gabi493
  • 146
  • 9
1

1) One way to do this is to use DDL to CREATE and REPLACE your table before running the query which imports the data.

This is an example of how to create a table

#standardSQL
 CREATE TABLE mydataset.top_words
 OPTIONS(
   description="Top ten words per Shakespeare corpus"
 ) AS
 SELECT
   corpus,
   ARRAY_AGG(STRUCT(word, word_count) ORDER BY word_count DESC LIMIT 10) AS top_words
 FROM bigquery-public-data.samples.shakespeare
 GROUP BY corpus;

Now that it's created you can import your data.

2) Another way is to use BigQuery schedule Queries enter image description here

3) If you write Python you can find an even better solution here

Tamir Klein
  • 3,514
  • 1
  • 20
  • 38