0

I have started working with prefect and I am trying to save my results to Google cloud Storage:

import prefect
from prefect.engine.results import GCSResult
from prefect.run_configs import DockerRun, LocalRun
from prefect.storage import Docker, Local

@prefect.task(checkpoint=True, result=GCSResult(bucket="redacted"))
def task1():
    return 1


storage = Local(...)
run_config = LocalRun()

with prefect.Flow(
    "myflow", 
    storage=storage, 
    run_config=run_config
) as flow:
    results = task1()

flow.run()

Provided I have my GOOGLE_APPLICATION_CREDENTIALS environment variable set to the key, everything works fine.

However, when trying to dockerize my flow, I run into some difficulties:

storage = Docker(...)
run_config = DockerRun(dockerfile="DockerFile")

with prefect.Flow(
    "myflow", 
    storage=storage, 
    run_config=run_config
) as flow:
    ... # Same definition as previously

flow.register()

In such case, when trying to run my flow with a docker agent (be it on the same machine the flow was registered from or another, I get this error):

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials.
Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. 
For more information, please see https://cloud.google.com/docs/authentication/getting-started

Following the documentation, I have tried to set a GCP_CREDENTIALS secret on my Prefect cloud.To no avail, I am still running into the same error.

I have also tried to save the results in a separate GCSUpload task, but I am still having the same error.

One solution that I see would be to package credentials inside of my docker image through the DockerFile, however I feel like this should be a use case where I should be using the Prefect secrets.

LoicM
  • 1,786
  • 16
  • 37

1 Answers1

1

I have worked out something retrieving credentials using a PrefectSecret task.

I had to create an additional GCSUpload task which took the result of task1 to directly save it in GCS.

My final code looks like this:


import prefect
from prefect.tasks.gcp.storage import GCSUpload
from prefect.tasks.secrets import PrefectSecret
from prefect.run_configs import DockerRun
from prefect.storage import Docker

retrieve_gcp_credentials = PrefectSecret("GCP_CREDENTIALS")


@prefect.task(checkpoint=True, result=GCSResult(bucket="redacted"))
def task1():
    return "1"

save_results_to_gcp = GCSUpload(bucket="redacted")

storage = Docker()
run_config = DockerRun()

with prefect.Flow(
    "myflow", 
    storage=storage, 
    run_config=run_config
) as flow:
    credentials = retrieve_gcp_credentials()
    results = task1()
    save_results_to_gcp(results, credentials=credentials)

flow.run()

(Note that I also had to change the type of value returned by task1, as the task can only upload string or bytes)

This is good enough for my use case (simply persist the results in GCS), but I'll leave the question open if someone knows how to use the GCSResult as it would also be useful for caching.

LoicM
  • 1,786
  • 16
  • 37