I want to schedule the data transfer job between Cloud Storage to BigQuery.
I have one application that dumps data continuously to the GCS bucket path (let's say gs://test-bucket/data1/*.avro
) that I want to move to BigQuery as soon as the object is created in GCS.
I don't want to migrate all the files available within the folder again and again. I just want to move only the newly added object after the last run in the folder.
BigQuery data transfer service is available that takes Avro files as input but not a folder and it does not provide only newly added objects instead all.
I am new to it so might be missing some functionality, How can I achieve it?
Please note- I want to schedule a job to load data at a certain frequency (every 10 or 15 min), I don't want any solution from a trigger perspective since the number of objects that will be generated will be huge.