my beam dataflow job succeeds locally (with DirectRunner
) and fails on the cloud (with DataflowRunner
)
The issue localized in this code snippet:
class SomeDoFn(beam.DoFn):
...
def process(self, gcs_blob_path):
gcs_client = storage.Client()
bucket = gcs_client.get_bucket(BUCKET_NAME)
blob = Blob(gcs_blob_path, bucket)
# NEXT LINE IS CAUSING ISSUES! (when run remotely)
url = blob.generate_signed_url(datetime.timedelta(seconds=300), method='GET')
and dataflow points to the error: "AttributeError: you need a private key to sign credentials.the credentials you are currently using just contains a token."
My dataflow job uses the service account (and appropriate service_account_email
is provided in the PipelineOptions
), however I don't see how I could pass the .json credentials file of that service account to the dataflow job. I suspect that locally my job runs successfully because I set the environment variable GOOGLE_APPLICATION_CREDENTIALS=<path to local file with service account credentials>
, but how do I set it similarly for remote dataflow workers? Or maybe there is another solution, if anyone could help