I need to get the data from a third party API and ingest it in google BigQuery. Perhaps, I need to automate this process through google services to do it periodically.
I am trying to use Cloud Functions, but it needs a trigger. I have also read about App Engine, but I believe it is not suitable for only one function to make pull requests.
Another doubt is: do I need to load the data into cloud storage or can I load it straight to BigQuery? Should I use Dataflow and make any configuration?
def upload_blob(bucket_name, request_url, destination_blob_name):
"""
Uploads a file to the bucket.
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
request_json = requests.get(request_url['url'])
print('File {} uploaded to {}.'.format(
bucket_name,
destination_blob_name))
def func_data(request_url):
BUCKET_NAME = 'dataprep-staging'
BLOB_NAME = 'any_name'
BLOB_STR = '{"blob": "some json"}'
upload_blob(BUCKET_NAME, request_url, BLOB_NAME)
return f'Success!'
I expect advise about the architecture (google services) that I should use for creating this pipeline. For example, use cloud functions (get the data from API), then schedule a job using service 'X' to input data to storage and finally pull the data from storage.