1

I am writing a Python script to check the content of some files existing in Google Cloud Storage, if the contain some PII. Script is as below

dlp = google.cloud.dlp_v2.DlpServiceClient()

url = "gs://{}/{}".format("my-bucket-name", "my_file_name")
storage_config = {"cloud_storage_options": {"file_set": {"url": url}}}

parent = dlp.project_path("my-project-name")

inspect_job = {
    "inspect_config": inspect_config,
    "storage_config": storage_config
}

operation = dlp.create_dlp_job(parent, inspect_job=inspect_job)

job_done = threading.Event()

job = dlp.get_dlp_job(operation.name)
try:
    if job.inspect_details.result.info_type_stats:
        for finding in job.inspect_details.result.info_type_stats:
            print("Info type: {}; Count: {}".format(finding.info_type.name, finding.count))
    else:
        print("No findings.")
    job_done.set()
except Exception as e:
    print(e)
    raise

finished = job_done.wait(timeout=3000)
if not finished:
    print(
        "No event received before the timeout. Please verify that the "
        "subscription provided is subscribed to the topic provided."
    )

I read in the documentation that DLP API created a service account of its own with the required set of permission.

When the Cloud DLP is enabled, a service account is added to the project.

To access both Google Cloud resources and execute calls to Cloud DLP by means of a JobTrigger, Cloud DLP uses the credentials of the Google APIs service account to authenticate to other APIs. The Google APIs service account is designed specifically to run internal Google processes on your behalf. The service account is identifiable using the email:

service-[PROJECT_NUMBER]@dlp-api.iam.gserviceaccount.com

When I run the code, I get a 403 error stating it does not have required permission dlp.jobs.create. I updated IAM policy for the account to contain a custom role with the below-mentioned policy set(since this is only a project for learning purpose).

dlp.analyzeRiskTemplates.create
dlp.analyzeRiskTemplates.delete
dlp.analyzeRiskTemplates.get
dlp.analyzeRiskTemplates.list
dlp.analyzeRiskTemplates.update
dlp.deidentifyTemplates.create
dlp.deidentifyTemplates.delete
dlp.deidentifyTemplates.get
dlp.deidentifyTemplates.list
dlp.deidentifyTemplates.update
dlp.inspectTemplates.create
dlp.inspectTemplates.delete
dlp.inspectTemplates.get
dlp.inspectTemplates.list
dlp.inspectTemplates.update
dlp.jobTriggers.create
dlp.jobTriggers.delete
dlp.jobTriggers.get
dlp.jobTriggers.list
dlp.jobTriggers.update
dlp.jobs.cancel
dlp.jobs.create
dlp.jobs.delete
dlp.jobs.get
dlp.jobs.list
dlp.kms.encrypt
dlp.storedInfoTypes.create
dlp.storedInfoTypes.delete
dlp.storedInfoTypes.get
dlp.storedInfoTypes.list
dlp.storedInfoTypes.update
serviceusage.services.use
  1. My Service account has two separate permission sets:

    • DLP permissions separately:
    • dlp.jobs.create
    • dlp.jobs.cancel
    • dlp.jobs.delete
    • dlp.jobs.get
    • dlp.jobs.list
  2. Owner permission so it has unrestricted access to all google resources.

    • roles/owner

However, when I run the script now, It still gives the following error:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.PERMISSION_DENIED
    details = "Not allowed, access denied for permission dlp.jobs.create."
    debug_error_string = "{"created":"@1581682593.219000000","description":"Error received from peer ipv4:xxx.xxx.x.x","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Not allowed, access denied for permission dlp.jobs.create.","grpc_status":7}"


google.api_core.exceptions.PermissionDenied: 403 Not allowed, access denied for permission dlp.jobs.create.
Akash
  • 387
  • 1
  • 5
  • 19
  • 1
    Does your service account `service-[PROJECT_NUMBER]@dlp-api.iam.gserviceaccount.com` have all the necessary permission to write data to the bucket? – Serhii Rohoza Feb 14 '20 at 14:31
  • Yes, The Service account has "Storage Admin" permission given, thus has access to all objects in the bucket. – Akash Feb 14 '20 at 14:33
  • Could you provide full script and update list of permission in the question to make me able to run it on my test project? – Serhii Rohoza Feb 14 '20 at 14:39
  • @SerhiiRohoza This is the exact script I am running. You just need to provide it the bucket_name, project_name and set up the credentials for Service account with roles/dlp.user. Use this service account to set the GOOGLE_APPLICATION_CREDENTIALS in system variables. ensure you have a fil in the GCS bucket with the same file and contains some text to validate against any DLP rule. Do you want me to share the inspect config as well? – Akash Feb 17 '20 at 09:32
  • @SerhiiRohoza Added code is a mini version of https://cloud.google.com/dlp/docs/inspecting-text#inspecting_a_text_file – Akash Feb 17 '20 at 09:37
  • It'll be better if I check it with your script to avoid coding related errors. Also, please update permissions list in your question IF your service account does have all the necessary permission to write data to the bucket. – Serhii Rohoza Feb 17 '20 at 14:01
  • @SerhiiRohoza the actual script is pulling data from a database and is a business product. I would only be able to share the snippet. You just need to provide the values for the bucket_name/file_name/service account. The rest of it doesn't require the entire script. If you are getting the result set printed that means this is working. – Akash Feb 17 '20 at 14:17
  • 1. Could you check your active account with command `gcloud auth list`? Then update your question with command output and permission of your active account. 2. Could activate your DLP service account with command `gcloud auth activate-service-account` and run your script again? It'll be helpful to see results in your question. 3. Could you clarify how do you use your DLP service account in your the script? – Serhii Rohoza Feb 18 '20 at 15:10
  • Did you solve your problem? – Serhii Rohoza Feb 25 '20 at 14:06
  • @SerhiiRohoza I had to migrate to a different task. This is postponed for now. Unfortunately Unable to test the same. Will get back when I can with more info. – Akash Mar 04 '20 at 09:14

1 Answers1

2

Please read the official documentation Authenticating to the Cloud DLP API

You must authenticate to the Cloud DLP API in order to use it

You have to use a service account to authenticate to the Cloud DLP API:

  1. Create a Service Account

  2. Granting Roles to Service Accounts ((roles/dlp.user)

  3. Download the key.json file

  4. Set the env variable:

    export GOOGLE_APPLICATION_CREDENTIALS=[PATH_TO_KEY_FILE]
    

The service account that you mentioned service-[PROJECT_NUMBER]@dlp-api.iam.gserviceaccount.com:

The Google APIs service account is designed specifically to run internal Google processes on your behalf

Please read about Types of service accounts

  1. User-managed service accounts

  2. Google-managed service account

EDIT

1.Created a new project

gcloud projects create class-run
gcloud config set project class-run

2.Enable Billing

3.Enable DLP Api

4.I tested using To set up a scan job of a Cloud Storage bucket using Cloud DLP:

5.Create a pub sub topic, and a pub sub subscription

6.Create a service account, assign the role and download the key

 gcloud iam service-accounts create test-cloud  --description "test-cloud"  --display-name "test-cloud"
 gcloud projects add-iam-policy-binding class-run   --member serviceAccount:test-cloud@class-run.iam.gserviceaccount.com  --role roles/dlp.user
 gcloud iam service-accounts keys create key.json  --iam-account test-cloud@class-run.iam.gserviceaccount.com

7.Set the default credentials

export GOOGLE_APPLICATION_CREDENTIALS=/home/user/key.json

8.Run the script and everything worked as expected.

marian.vladoi
  • 7,663
  • 1
  • 15
  • 29
  • Unfortunately gives out the same error as the previous set. I had followed the exact steps earlier as well. Still get the same errors. – Akash Feb 17 '20 at 09:09