As per (Link), it's possible to mask sensitive data by partially o fully replacing characteres with a symbol (De-identifying sensitive data) using the DLP API in GCP. I didn't find any glue to customize the transformation rule in the request, for example, Let's say we need to transform the 16-digit account number, where once the value has been detected, the first 6 digits "and" the last 4 digits will be left intact while the rest of the digits will be replaced by "*" (123456******3456)
, and any such combination, however, the configuration seems to only allow the transformation of the first "or" last digits of the field.
{
"deidentifyConfig": {
"recordTransformations": {
"fieldTransformations": [
{
"fields": [
{
"name": "NUMBER_ACCOUNT"
}
],
"primitiveTransformation": {
"characterMaskConfig": {
"maskingCharacter": "#",
"numberToMask": -6
}
}
}
]
}
}
Result of the code above:
"stringValue": "#########123456"
The tag numberToMask
allow to set the number of characters to mask, and, in combination with reverseOrder
we can obscure just first o last digits, but, what about both?
is it possible to use REGEX or tranformation rule to create a custom deidentifyConfig
or what should be the approach to inspect (detect) a specifict sensitive data and apply any custom masking rule using DLP?
For example, how to get this masked values:
12345678****3456
12345678******56
Note. Dynamic Data Masking in BigQuery is not an option here, since in there does't exist a way to create a custom masking rule yet