0

I have a requirement where I might have to update the Bigquery External tables on a periodic basis. The GCS location has timestamp for every incremental run, I would like to update to the latest timestamp folder as the path of External table.

One way i see is only dropping the table and creating again by pointing it to latest folder. But, is there any other way to update it without dropping the table

Sri Bharath
  • 115
  • 1
  • 2
  • 10
  • 1
    Create a schedule query, which does `CREATE OR REPLACE EXTERNAL TABLE ... ` – Samuel Nov 19 '22 at 18:09
  • For your requirement, I would suggest you use a wildcard symbol to read files that match to a specific pattern and not provide a static file name. For example: "gs://bucketName/*.csv". You don't need to drop and recreate the table if you use a wildcard. to Let me know if that helps. – Shipra Sarkar Nov 20 '22 at 13:25
  • @ShipraSarkar - I get that this, but this works for different files under a folder. My question is what if the folder path itself is date based. Something like `gs://bucket_name/2022-11-21/*.csv` . I can have one folder for each day. – Sri Bharath Nov 21 '22 at 10:24
  • Hi @SriBharath, is your issue resolved using `CREATE OR REPLACE EXTERNAL TABLE..`? – Shipra Sarkar Nov 22 '22 at 14:46
  • @ShipraSarkar Create or Replacing External table does solve the problem. But, it doesn't on an incremental run basis, as running scheduled DDL operations doesn't look good – Sri Bharath Nov 29 '22 at 12:36
  • Hi @Sri Bharath, for your requirement, I would suggest you to use [Cloud Composer](https://cloud.google.com/blog/topics/developers-practitioners/what-cloud-composer/). You can transfer the files from GCS to BigQuery and modify the path of the external tables using Apache Airflow. Can you try the solution provided in this [StackOverflow Thread](https://stackoverflow.com/questions/70220549/how-to-modify-the-source-file-path-of-a-bigquery-external-table-in-airflow)? Let me know if that helps. – Shipra Sarkar Dec 02 '22 at 13:07

1 Answers1

0

As suggested by @Samuel , you can use the SQL statement CREATE or REPLACE EXTERNAL TABLES for your requirement. Scheduled queries support DML and DDL statements which can be used to create the new tables. You can use the below mentioned query parameter to create the table according to your schedule :

My_database_name.my_table_name.my_results_{run_date}

For more information you can refer to this documentation.

Shipra Sarkar
  • 1,385
  • 3
  • 10