I following is my JSON Object for DLP API call to mask specific column of data on parquet file which is on a bucket on GCS. While calli dlp.deidentify_content() method i have to pass item to it, not sure how to pass parquet file, i have already mentioned parquet file path.
inspect_config = {
'info_types': info_types,
'custom_info_types': custom_info_types,
'min_likelihood': min_likelihood,
'limits': {'max_findings_per_request': max_findings},
}
actions = [{
'saveFindings': {
'outputConfig': {
'table': {
'projectId': project,
'datasetId': 1,
'tableId': "result1"
}
}
}
}]
# Construct a storage_config containing the file's URL.
url = 'gs://{}/{}'.format(bucket, filename)
storage_config = {
'cloud_storage_options': {
'file_set': {'url': url}
}
}
# Construct deidentify configuration dictionary
deidentify_config = {
"recordTransformations": {
"fieldTransformations": [
{
"fields": [
{
"name": "IP-address"
}
],
"primitiveTransformation": {
"cryptoHashConfig": {
"cryptoKey": {
"transient": {
"name": "[TRANSIENT-CRYPTO-KEY-1]"
}
}
}
}
},
{
"fields": [
{
"name": "comments"
}
],
"infoTypeTransformations": {
"transformations": [
{
"infoTypes": [
{
"name": "PHONE_NUMBER"
},
{
"name": "EMAIL_ADDRESS"
},
{
"name": "IP_ADDRESS"
}
],
"primitiveTransformation": {
"cryptoHashConfig": {
"cryptoKey": {
"transient": {
"name": "[TRANSIENT-CRYPTO-KEY-2]"
}
}
}
}
}
]
}
}
]
}
}
# Call the API
response = dlp.deidentify_content(
parent, inspect_config=inspect_config,
deidentify_config=deidentify_config, item=item)
What i am trying to accomplish is to mask parquet file which is on GCS bucket and mask few column and the stored the masked parquet file as table on BigQuery table.