Questions related to the Google Cloud Data Loss Prevention API. Classification and de-identification of sensitive data like PII. Works on text and images.
Questions tagged [google-cloud-dlp]
132 questions
1
vote
1 answer
How to whitelist url with Cloud DLP?
I use Cloud DLP for identify sensitive data but I want allow (whitelist) some data so DLP not identify them.
For example by default I want to match URL infoType so DLP identify it. But I also want to allow certain URL from google.com and yahoo.com.…

FlutterFirebase
- 2,163
- 6
- 28
- 60
1
vote
2 answers
Text dictionary like transformation in Google DLP
I would like the data to be masked, but it was possible to understand how many people studied at UNIVERSITY_1.
What de-identification transformation can I use to accomplish such information\text masking?
Input:
{
"students": [
{
…

Вадим По
- 11
- 1
1
vote
1 answer
Using Google Cloud DLP Stored infotype and getting 400 Invalid built-in info type name
I have a stored infotype in the ready state on my DLP dashboard. Its name is Federal_Income_Tax.
Using the python example at https://cloud.google.com/dlp/docs/concepts-infotypes I see the following:
info_types = [{"name": info_type} for info_type…

Craig Hurlbut
- 15
- 2
1
vote
1 answer
Python GCP API not able to read environmental values
Developing a DLP client and Setting GOOGLE_APPLICATION_CREDENTIALS using win shell.
API is failing with following signature:
google.auth.exceptions.DefaultCredentialsError: File "XXXXX.json" was not found.
When set in code…

Arnab Mukherjee
- 190
- 3
- 18
1
vote
1 answer
Data Loss Prevention finds superfluous entities when masking email
I am calling the DLP API to mask person names and email addresses in text, using the following request:
Request
{
"item": {
"value": "Eleanor Rigby\nPharmacist\neleanor.rigby@example.com"
},
"deidentifyConfig": {
…

Marek Grzenkowicz
- 17,024
- 9
- 81
- 111
1
vote
2 answers
DLP data scan from bigquery table showing start byte as null
I have scanned a Bigquery table from Google DLP Console. The scan results are saved back into a big query table. DLP has identified sensitive information, but the start byte is shown as null, can anyone help me understand why?
The source data…

Kuwali
- 233
- 3
- 13
1
vote
1 answer
Exporting Google Drive/Docs files to Google Cloud Storage
We need to scan files with google dlp. However google dlp scanning is only supported in gcs. (https://cloud.google.com/blog/products/identity-security/take-charge-of-your-data-scan-for-sensitive-data-in-just-a-few-clicks)
So I need to export the…

hans
- 11
- 1
1
vote
1 answer
De-identifying storages in Google Cloud DLP
I was using a dataflow streaming template for DLP deidentification from GCS to BQ. I wanted a batch solution. I found out cloud.google.com/dlp/docs/deidentify-storage, which provided a new "deidentify" action for the create_dlp_job function. When I…

Akshay Chandru
- 43
- 4
1
vote
1 answer
Getting 403 Permission Denied with GCP DLP API
I am writing a Python script to check the content of some files existing in Google Cloud Storage, if the contain some PII. Script is as below
dlp = google.cloud.dlp_v2.DlpServiceClient()
url = "gs://{}/{}".format("my-bucket-name",…

Akash
- 387
- 1
- 5
- 19
1
vote
1 answer
Google Cloud DLP API: Default Secure Communication
We are using java library com.google.cloud:google-cloud-dlp to make Google Cloud DLP calls. On the client side, we are utilizing DlpServiceClient. I know for the fact that it internally uses gRPC. Wondering whether the default communication uses…

Sreedhar
- 35
- 4
1
vote
1 answer
Google Cloud DLP tag in Data Catalog shows as Job State as pending?
I have first created a custom template in DLP (with custom detectors) and then created a DLP job using the new DLP template against a BQ table and ran the job with publish to Data Catalog setting.
The DLP job completed but the DLP tag in Data…

John
- 39
- 2
1
vote
1 answer
How to read parquet file from bucket (GCS) and de-identification to specific column using DLP api?
I following is my JSON Object for DLP API call to mask specific column of data on parquet file which is on a bucket on GCS. While calli dlp.deidentify_content() method i have to pass item to it, not sure how to pass parquet file, i have already…

Lobsnag Tsering
- 11
- 6
1
vote
1 answer
Text limit for Google DLP
I can't seem to find the text size limit for de-identifying text
https://cloud.google.com/dlp/limits said there's a 4KB limit for each quote.
What does quote means ? Does it mean a string ?

Rahadian Kumang
- 591
- 6
- 15
1
vote
1 answer
How to get a valid token to use GCP Data Loss Prevention API on a local machine were SDK is installed?
Right now I don't managed to have the Google Cloud Platform Data Loss Prevention (DLP) client library for python working behind a SSL proxy (it works fine with other GCP client lib for example for storage or…

Dr. Fabien Tarrade
- 1,556
- 6
- 23
- 49
1
vote
3 answers
How to get the location of the scanned file when using Google Cloud DLP API?
I'm scanning a nested directory in a cloud storage bucket. The result doesn't contain the matched value (quote) although I have the include_quote on. Also, how do I get the name of the files that have the matching along with the matched values? I'm…

Kiso
- 13
- 4