1

I have few python scripts which will process CSV files sent to a cloud bucket and uploads the output file into another bucket. 1. init.py (main file) 2. google_client.py (Reads input file and upload output file) 3. DP_Workflow.py (submit file to DP workflow to generate output file)

This works fine locally, But I am trying to find ways to get this uploaded into a bucket and run it against CSV file whenever gets uploaded in another bucket. Is there a way to trigger these files at once?

Sid
  • 153
  • 1
  • 12

1 Answers1

4

You should create a cloud function, this cloud function will be triggered whenever a csv file is uploaded to your bucket. To deploy a cloud function that responds to a new file in a bucket, you can use the following command:

gcloud functions deploy YOUR_FUNCTION \
  --entrypoint=handler \
  --runtime=python37 \
  --trigger-resource=YOUR_TRIGGER_BUCKET_NAME \
  --trigger-event=google.storage.object.finalize

Then rename your init.py to main.py (google logic) and put the following into that file as an entrypoint:

def handler(data, context):
    bucket = data['bucket']
    file = data['name']
    .... 
    # whatever processing you want here

Your directory layout:

  • main.py
  • requirements.txt
  • google_client.py
  • DP_Workflow.py
Cloudkollektiv
  • 11,852
  • 3
  • 44
  • 71
  • 1
    In addition to this answer there is official Google Cloud Functions documentation about triggers https://cloud.google.com/functions/docs/concepts/events-triggers. In your case your probably need the Pub/Sub one with the Event type OBJECT_FINALIZE https://cloud.google.com/storage/docs/pubsub-notifications#events. And also check this documentation about Publishing object changes https://cloud.google.com/storage/docs/reporting-changes – iker lasaga Feb 07 '20 at 12:01
  • fixed few errors with the functions earlier, but ended up with below error which is related to pyodbc. Is there a way to get this module installed in cloud function's python 3.7 env. import pyodbc ModuleNotFoundError: No module named 'pyodbc' – Sid Feb 07 '20 at 18:13
  • Added the directory layout for clarity. Make sure to include a requirements.txt, this is used by google to install your dependencies. Inside you specify pyodbc==4.0.28 (one package per line). – Cloudkollektiv Feb 08 '20 at 13:35
  • Yes. I did specify the package inside my requirements file. I even tried using terraform to spin up this resource and ended up with below error, not sure if it is anything to do with python37 env. Deployment failure: Build failed: {"error": {"canonicalCode": "INVALID_ARGUMENT", "errorMessage": "`pip_download_wheels` had stderr output:\n Failed building wheel for pyodbc\nERROR: Failed to build one or more wheels\n\nerror: `pip_download_wheels` returned code: 1", "errorType": "InternalError", "errorId": "A664F8E9"}} – Sid Feb 08 '20 at 17:44
  • I found this other stackoverflow thread where it's discussed a similar issue. It's related to encoding of the file requirements.txt https://stackoverflow.com/questions/53589601/google-app-engine-python-3-7-build-error-pip-download-wheels-returned-code-1 – iker lasaga Feb 10 '20 at 15:50