0

I'm trying to build a flex-template image using a service account:

        gcloud dataflow flex-template build "$TEMPLATE_PATH" \
            --image-gcr-path "$TEMPLATE_IMAGE" \
            --sdk-language "JAVA" \
            --flex-template-base-image JAVA11 \
            --metadata-file "metadata.json" \
            --jar "target/XXX.jar" \
            --env FLEX_TEMPLATE_JAVA_MAIN_CLASS="XXX"

The service account has the following roles:

  "roles/appengine.appAdmin",
  "roles/bigquery.admin",
  "roles/cloudfunctions.admin",
  "roles/cloudtasks.admin",
  "roles/compute.viewer",
  "roles/container.admin",
  "roles/dataproc.admin",
  "roles/iam.securityAdmin",
  "roles/iam.serviceAccountAdmin",
  "roles/iam.serviceAccountUser",
  "roles/iam.roleAdmin",
  "roles/resourcemanager.projectIamAdmin",
  "roles/pubsub.admin",
  "roles/serviceusage.serviceUsageAdmin",
  "roles/servicemanagement.admin",
  "roles/spanner.admin",
  "roles/storage.admin",
  "roles/storage.objectAdmin",
  "roles/firebase.admin",
  "roles/cloudconfig.admin",
  "roles/vpcaccess.admin",
  "roles/compute.instanceAdmin.v1",
  "roles/dataflow.admin",
  "roles/dataflow.serviceAgent"

However, even with the dataflow.admin and dataflow.serviceAgent roles, my service account is still unable to perform this task.

The documentation https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates advises to grant the roles/owner role to the service account, but I'm hesitant to do that as this is meant to be part of a CI/CD pipeline and giving a service account an owner role doesn't really make sense to me unless I'm completely wrong.

Is there any way to circumvent this issue without granting the owner role to the service account?

bitnahian
  • 516
  • 5
  • 17
  • 1) Usually the error message includes the missing permission. Post the entire error message in your question. 2) The `Owner` role is legacy and only one or two people in a project or even better at the ORG level should have that role. In your example, you have assigned a lot of admin roles (are they really necessary) which is just as bad. For example, why does a service account for Cloud Dataflow require admin rights to App Engine, Cloud Functions, etc.? – John Hanley Feb 11 '21 at 08:17
  • For testing and development, using the legacy role Editor is OK while you figure out the correct permission set. If the role Editor fails, then something else is going on. – John Hanley Feb 11 '21 at 08:18
  • Yes, this service account is merely intended for testing purposes. Some of the roles were left behind from previous development tasks, and I agree it's a bad practice. However, for this task in particular, I still am not able to put the first script as part of my CI/CD pipeline. The error message simply says: `ERROR: (gcloud.dataflow.flex-template.build) PERMISSION_DENIED: The caller does not have permission ` – bitnahian Feb 11 '21 at 22:30
  • Look at the Stackdriver logs. The API call that is failing should be logged and that might help understand which API is failing due to lack of permissions. – John Hanley Feb 11 '21 at 23:38
  • Hi, I managed to figure out that I needed to add the `roles/cloudbuild.builds.builder` for the build job to start. However, I'm encountering this error after the build step. If you could kindly help me navigate through this, I'd be really grateful. `ERROR: (gcloud.dataflow.flex-template.build) HTTPError 403: AccessDeniedAccess denied.
    dev-tf-sa@XXX.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object.
    `
    – bitnahian Feb 12 '21 at 00:52
  • The service account already has the `roles/storage.objectViewer` role assigned to it. This role is presumably required to show the REMOTE BUILD OUTPUT in the console. However, the build step passes. – bitnahian Feb 12 '21 at 00:53
  • The error message says that the service account `dev-tf-sa@XXX.iam.gserviceaccount.com` does not have the permission `storage.objects.get`. Double check what roles are actually assigned. You can list the permissions that a role has with `gcloud iam roles describe roles/storage.objectViewer`. Note: `roles/cloudbuild.builds.builder` also has the permission `storage.objects.get`. I think you are assigning roles to one service account and using another. – John Hanley Feb 12 '21 at 01:17
  • I'm fairly sure now that my case is quite similar to the one discussed here https://stackoverflow.com/questions/56362244/google-cloud-build-view-logs-permissions. Thank you for your time and assistance :) – bitnahian Feb 12 '21 at 04:25
  • What roles are assigned to `@cloudbuild.gserviceaccount.com`? – John Hanley Feb 12 '21 at 04:36
  • The following roles are assigned - Cloud Build Service Account, Service Account User, Cloud Run Admin, Secret Manager Secret Accessor – bitnahian Feb 12 '21 at 04:38
  • If you assign the role Viewer, does that solve your problem to the other service account? – John Hanley Feb 12 '21 at 04:38
  • Go to the Cloud Build Settings page. https://console.cloud.google.com/cloud-build/settings. What is enabled? I am starting to think that the error is coming from Cloud Build and not your CLI command. – John Hanley Feb 12 '21 at 04:43
  • I have assigned the `roles/cloudbuild.builds.viewer` to the dev-tf-sa account as well, but no luck. From what it seems, the logs are inserted in the gs://.cloudbuild-logs.googleusercontent.com bucket, which does not allow access to my terraform-sa-account. Unfortunately, this is a special type of bucket and I haven't been able to perform any iam-policy-bindings on it either. Additionaly, the flex-template build command doesn't allow you to specify a gcs-log-directory either, so it's getting a bit complicated. However, the build does go through, only the logs aren't viewable. – bitnahian Feb 12 '21 at 04:43
  • Logs were never mentioned in your question. – John Hanley Feb 12 '21 at 04:47
  • I don't think Logs are the only part of the problem. The flex build command abstracts away pretty much everything without any documentation whatsoever. So, my best guess is that the permission error occurs when the remote log output is requested by the dataflow flex-template API. That's my hunch, and it seems that there is a current issue that's being tracked around log viewing permissions without an owner role. – bitnahian Feb 12 '21 at 04:50

1 Answers1

1

I just ran into the exact same issue and spent a few hours figuring this out. We use terraform service account as well. As you mentioned there are 2 main issues: service account access and the build logs access.

  1. By default, cloud build will use a default service account of form [project_number]@cloudbuild.gserviceaccount.com so you need to grant permissions to this service account to write to your gcs bucket backing the gcr container registry. I granted roles/storage.admin to my service account.
  2. Like you mentioned, by default again, cloud build saves the logs at gs://[project_number].cloudbuild-logs.googleusercontent.com. This seems to be a hidden bucket in the project, at least I could not see it. In addition, can't configure google_storage_bucket_iam_member for it, instead the recommendation as per this doc is to give roles/viewer at the project level to the service account running the gcloud dataflow ... command.

I was able to run the command successfully after the above changes.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
bsam
  • 1,838
  • 3
  • 20
  • 26