3

I have been building and creating templates for google dataflow for over a year now. I never had a problem creating templates and uploading them to gcs with the options.setTemplateLocation(templatePath); call. Since today, when creating the Pipeline with Pipeline.create(options); and running the java-program in eclipse, I get following exception:

Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233)
    at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:162)
    at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:52)
    at org.apache.beam.sdk.Pipeline.create(Pipeline.java:142)
    at mypackage.PipelineCreation.getTemplatePipeline(PipelineCreation.java:34)
    at myotherpackage.Main.main(Main.java:51)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222)
    ... 5 more
Caused by: java.lang.RuntimeException: Unable to verify that GCS bucket gs://my-projects-staging-bucket exists.
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPathIsAccessible(GcsPathValidator.java:92)
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.validateOutputFilePrefixSupported(GcsPathValidator.java:61)
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:228)
    ... 10 more
Caused by: com.google.api.client.http.HttpResponseException: 400 Bad Request
{
  "error" : "invalid_grant",
  "error_description" : "Bad Request"
}
    at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
    at com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:207)
    at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:149)
    at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:135)
    at com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
    at com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52)
    at com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:300)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
    at com.google.cloud.hadoop.util.ResilientOperation$AbstractGoogleClientRequestExecutor.call(ResilientOperation.java:166)
    at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:66)
    at org.apache.beam.sdk.util.GcsUtil.getBucket(GcsUtil.java:505)
    at org.apache.beam.sdk.util.GcsUtil.bucketAccessible(GcsUtil.java:492)
    at org.apache.beam.sdk.util.GcsUtil.bucketAccessible(GcsUtil.java:457)
    at org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPathIsAccessible(GcsPathValidator.java:88)
    ... 12 more

I was logged-in today with another account into gcloud but logged in again with the account associated with the project as "Owner" with gcloud auth login. I also restarted Eclipse but the same error keeps occuring. Also when trying to run the pipeline locally, I get another error but also with the "invalid_grant" "bad request" content. Restarting the laptop also had no effect.

My pom defines the google-cloud-dataflow-java-sdk-all with version 2.2.0 and upgrading to 2.5.0 had no effect.

I am able to copy data to the bucket with gsutil from commandline. But when running the java-program from command-line with mvn compile exec:java -Dexec.mainClass=mypackage.Main i still get the same errors.

My function to create a templatePipeline looks like the following:

public static Pipeline getTemplatePipeline(String jobName, String templatePath){
        DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
        options.setProject("my-project-id");
        options.setRunner(DataflowRunner.class);
        options.setStagingLocation("gs://my-projects-staging-bucket/binaries");
        options.setTempLocation("gs://my-projects-staging-bucket/binaries/tmp");
        options.setGcpTempLocation("gs://my-projects-staging-bucket/binaries/tmp");
        options.setZone("europe-west3-a");
        options.setWorkerMachineType("n1-standard-2");
        options.setJobName(jobName);
        options.setMaxNumWorkers(2);
        options.setDiskSizeGb(40);
        options.setTemplateLocation(templatePath);
        return Pipeline.create(options);
    }

Any help is highly appreciated.

Malte
  • 589
  • 5
  • 24

2 Answers2

7

You don't have to use service account and still you can use gcloud, you should use the following command and login with your account:

gcloud auth application-default login
Majico
  • 3,810
  • 2
  • 24
  • 36
  • This is not recommended and generates warning : WARNING: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/. – vdolez Apr 01 '21 at 08:43
3

I found the solution in the quickstart docs.

It seems like the gcloud auth is no longer used and you have to use a service account. So like in the docs I created a service account with role "project/owner" and downloaded it's json file to $path.

Then on my Mac i used export GOOGLE_APPLICATION_CREDENTIALS="$path" and within the same session used the command mentioned in the question to compile and execute the java-program.

Malte
  • 589
  • 5
  • 24
  • Try not to misguide someone if you are not 100% sure about something. gcloud auth is still being used and setting the GOOGLE_APPLICATION_CREDENTIALS is also an option. Both works fine – rand0mb0t May 21 '20 at 13:59