2

I am working on developing PII de identification application using data leak prevention(GCP). I am using de identification template for the de-identification rules.

Issue: I am not able to figure out about using custom info types in the deidentification template.

Here is a sample deidentification template:

{
  "deidentifyTemplate":{
    "displayName":"Email and id masker",
    "description":"De-identifies emails and ids with a series of asterisks.",
    "deidentifyConfig":{
      "infoTypeTransformations":{
        "transformations":[
          {
            "infoTypes":[
              {
                "name":"EMAIL_ADDRESS"
              }
            ],
            "primitiveTransformation":{
              "characterMaskConfig":{
                "maskingCharacter":"*"
              }
            }
          }
        ]
      }
    }
  }
}

In the above example, it a bultin info type(email) and in documentation custom info type snippet is like below:

    "inspect_config":{
      "custom_info_types":[
        {
          "info_type":{
            "name":"CUSTOM_ID"
          },
          "regex":{
            "pattern":"[1-9]{2}-[1-9]{4}"
          },
          "likelihood":"POSSIBLE"
        }
      ]
  }

There is not a valid object definition for inspect_config in rest documentation of deidentification template, its only valid in inspection template.

Is it possible to use custom info types in de identification template(infoTypeTransformations)?

Here is the link for rest documentation.

Arnab Mukherjee
  • 190
  • 3
  • 18
  • There's very little detail in your question. Could you please share what you have already tried so the community can start from there to help you? [Here](https://www.youtube.com/watch?v=LxSwhRSHeDQ) is a video on how to ask a good Stack Overflow Question. – Judith Guzman Dec 01 '20 at 21:00
  • @JudithGuzmán added as per request. Please let me now your thoughts. – Arnab Mukherjee Dec 05 '20 at 06:34
  • @JudithGuzmán can you share your inputs? I edited based on your ratings – Arnab Mukherjee Dec 06 '20 at 16:15
  • Its discouraging that community does not do a unbiased criticism. Update: Above feature is not feasible. We can't use custom info types with de-identification template in `infoTypeTransformations` – Arnab Mukherjee Dec 08 '20 at 12:42
  • 1
    Hey, I'm your seeing your comments as I was in a family trip. Thanks for editing your question, now your concern is more clear and I see where you're coming from. One question, have you tried using the custom infotype with the [surrogateInfoType](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#DeidentifyTemplate.CryptoReplaceFfxFpeConfig.FIELDS.surrogate_info_type) ? It seems like it accepts custom info_types. Also, have you tried the API first [here](https://cloud.google.com/dlp/docs/reference/rest/v2/projects.content/deidentify)? – Judith Guzman Dec 08 '20 at 16:29
  • Hey, thanks for the insights. I will test and let you know. – Arnab Mukherjee Dec 08 '20 at 17:09
  • 1
    @JudithGuzmán Seems like surrogate info types are specific to encryption. My use case is: suppose i have large number of text files and i want to mask `IP ` addresses. To do that, I need to the have ability to find that custom pattern(using regex) and apply the masking transformation. IP is not a built in type. Instead of IP it can be anything else as well. With `inspection` templates its simple and we have documentation examples but issue is with `de-identification` – Arnab Mukherjee Dec 11 '20 at 07:15
  • 1
    Added the [solution](https://stackoverflow.com/a/65266295/5739950) – Arnab Mukherjee Dec 16 '20 at 05:09

2 Answers2

2

Yes it is possible to use custom info types. What will need to be done is that you create a De-Identify Template and also an Inspect Template.

Then when you call the API, you send both of the template in as parameters. With python using the dlp client library, here is some sample pseudocode

from google.cloud import dlp_v2

dlp_client = dlp_v2.DlpServiceClient()
dlp_client.deidentify_content(
    request={
        inspect_template_name = "projects/<project>/locations/global/inspectTemplates/<templateId>,
        deidentify_template_name = "projects/<project>/locations/global/deidentifyTemplates/<templateId>,
        parent = <parent>,
        item = <item>
    }
)
Nek
  • 463
  • 5
  • 15
  • Sound interesting, but we need to maintain two templates. I found an simpler approach. Creating `stored` info type. Currently exploring it. If successful, I will write the answer or Else will try yours in worse case. – Arnab Mukherjee Dec 12 '20 at 14:13
  • Liked you approach as well, upvoting it. Its good to know multiple approach for a single solution. Let me know if you find my [approach](https://stackoverflow.com/a/65266295/5739950) interesting as well – Arnab Mukherjee Dec 12 '20 at 14:52
1

We can use custom info types in deidentification template using stored info types.

We can create stored info type using API calls and that stored info type can be referenced like a built-in info type.

Creating stored info type

  • Few global variables and dependencies
import google.cloud.dlp
import os

dlp = google.cloud.dlp_v2.DlpServiceClient()
default_project = os.environ['GOOGLE_PROJECT']  # project id
parent = f"projects/{default_project}"

# details of custom info types
custom_info_id = "<unique-id>" # example: IP_ADDRESS
custom_info_id_pattern = r"<regex pattern>"
  • Creating the request payload
info_config = {

    "display_name": custom_info_id,
    "description": custom_info_id,

    "regex":
        {
            "pattern": custom_info_id_pattern
        }
}
  • Making api call
response = dlp.create_stored_info_type(request={
    "parent": parent,
    "config": info_config,
    "stored_info_type_id": custom_info_id
})

How to reference stored infotype

  • use need to use stored_info_type_id in deidentification template for the operation:
          {
            "info_types":[
              {
                "name":"IP_ADDRESS"  # this is defined stored_info_type_id
              }
            ],
            "primitive_transformation":{
              "character_mask_config":{
                "characters_to_ignore":[
                  {
                    "characters_to_skip":"."
                  }
                ],
                "masking_character":"*"
              }
            }
          },
Judith Guzman
  • 415
  • 3
  • 11
Arnab Mukherjee
  • 190
  • 3
  • 18