8

I am trying to follow this simple Dataflow example from google cloud site.

I have successfully installed the dataflow pipeline plugin and gcloud SDK (as well as Python 2.7). I have also set up a project on google cloud and enabled billing and all the necessary API's - as specified in the instructions above.

However, when I go to the run configurations and change the Pipeline Arguments tab to select BlockingDataflowPipelineRunner, after entering creating a bucket and setting my project-id, hitting run gives me:

Caused by: java.lang.IllegalArgumentException: Output path does not exist or is not writeable: gs://my-cloud-dataflow-bucket
    at com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
    at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.verifyPathIsAccessible(DataflowPathValidator.java:79)
    at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.validateOutputFilePrefixSupported(DataflowPathValidator.java:62)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:255)
    at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.fromOptions(BlockingDataflowPipelineRunner.java:82)
    ... 9 more

I have used my terminal to execute 'gcloud auth login' and I see in the browser that I am successfully logged in.

I am really not sure what I have done wrong here. Can anyone confirm if this is a known issue with using dataflow pipeline and google buckets?

Thanks!

Zain Aftab
  • 703
  • 7
  • 21
RoshP
  • 151
  • 1
  • 2
  • 9
  • 1
    Can you try running `gsutil ls gs://my-cloud-dataflow-bucket` on the command-line? (I'll give a generic answer first, and the follow up with a more specific one once we figure out the root-cause.) – Davor Bonaci Mar 19 '16 at 17:50

5 Answers5

7

I had a similar issue with GCS bucket permissions, though I certainly had write permissions and I could upload files into the bucket. What solved the problem for me was acquiring roles/dataflow.admin permission for the project I was submitting the pipeline to.

deadmoto
  • 482
  • 5
  • 8
3

When submitting pipelines to the Google Cloud Dataflow Service, the pipeline runner on your local machine uploads files, which are necessary for execution in the cloud, to a "staging location" in Google Cloud Storage.

The pipeline runner on your local machine seems to be unable to write the required files to the staging location provided (gs://my-cloud-dataflow-bucket). It could be that the location doesn't exist, or that it belongs to a different GCP project than you authenticated against, or that there are more specific permissions set on that bucket, etc.

You can start debugging the issue via gsutil command-line too. For example, try running gsutil ls gs://my-cloud-dataflow-bucket to attempt to list the contents of the bucket. Then, try to upload via gsutil cp command. This will perhaps produce enough information to root-cause the issue you are facing.

Davor Bonaci
  • 1,709
  • 8
  • 9
  • I ran the following commands in my terminal: Your current project is [rosh-test]. You can change this setting by running: $ gcloud config set project PROJECT_ID Roshs-MacBook-Air:~ RoshPlaha$ gsutil ls gs://my-cloud-dataflow-bucket AccessDeniedException: 403 Forbidden I should point out, in the eclipse dataflow plugin, when creating the project, I specified the name of the bucket and then clicked 'create'. Eclipse told me the creation of the bucket was successful. However, when I check on gcp to see if the bucket exists, it says it doesn't. – RoshP Mar 20 '16 at 00:14
  • Furthermore, when I try to manually create the same bucket - it says that I cant have two buckets with the same name! Whilst in gcp, I started up gsutil and ran: gsutil acl ch -u myemail@gmail.com:W gs://my-cloud-dataflow-bucket. However that also gives a 403 forbidden error. – RoshP Mar 20 '16 at 00:14
  • A few things to check: make sure your account is at least an Editor on the project, don't forget to run `gcloud auth login`. Also, when creating the bucket, make sure the project name is specified. If this fails, I suggest creating the bucket manually in the Developers Console and just using it in Eclipse. – Davor Bonaci Mar 20 '16 at 04:10
  • 1
    Hey Davor. In the gcp storage section ui, I changed both the bucket permissions and default bucket permissions so that owners, editors and viewers all have 'owner' permissions set. I also added a new entry for my specific email address. Via the terminal I executed: gsutil cp somefile.txt gs://my-cloud-dataflow-bucket. I saw that the file was uploaded - so the permissions seem ok. However, when I run my eclipse program, I still get the error: the bucket does not exist or is not writeable :( – RoshP Mar 20 '16 at 13:19
  • You seem to have made progress on the issue. Before you were getting 403 for all actions; now you seem to be successfully copying files --> definite sign of progress. Could it be that your Eclipse environment somehow cannot access your home directory and/or command-line environment variables? – Davor Bonaci Mar 20 '16 at 19:30
  • 2
    How did you solve the problem ? I am having the same issue, I can `gsutil cp` on CLI but have the error on the Java code side. Thanks. – Michel Hua Apr 12 '21 at 09:12
  • Anyone found solution on this issue, i am also seeing this error. – SudhirKumar Sep 28 '21 at 21:16
2

Try to provide zone parameter, it works in my case with similar error. And of course export GOOGLE_APPLICATION_CREDENTIALS environment variable before running your app.

 ...
 -Dexec.args="--runner=DataflowRunner \
 --gcpTempLocation=gs://bucket/tmp \
 --zone=bucket-zone \
 ...
Broken_Window
  • 2,037
  • 3
  • 21
  • 47
Eugene
  • 21
  • 3
0

Got the same error. Fixed it by setting GOOGLE_APPLICATION_CREDENTIALS using the key file with write permissions in ~/.bash_profile on Mac.

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/32038798) – paneerakbari Jun 21 '22 at 20:43
-5

I realised I needed to use a specific acl command via gsutil. Setting my account to have owner permissions did not do the job. Instead using:

gsutil acl set public-read-write gs://my-bucket-name-here

worked in this case. Hope this helps someone!

Blue
  • 22,608
  • 7
  • 62
  • 92
RoshP
  • 151
  • 1
  • 2
  • 9
  • 2
    We should not encourage users to set `public-read-write` on their buckets. This is not necessary. Editors of the project needs to have a write access, as well as the service accounts. Then, you need to authenticate as one of the editors, and that should be enough. – Davor Bonaci Mar 20 '16 at 19:27