1

I'm struggling with dlp : I have to scan 1000 lines of more than 30000 bigquery table each month. Instead of having to give habilitations to each bigquery to the managed service account of its project, we would rather use one "master" project and so authorize only its dlp managed service account.

It would allow us to manage the jobs from only one project and not having to check status/conf in a lot of different project.

I tried to create trigger for all of our table with a template, but the limits is 1000 triggers

So I'm wondering what could be the right strategy ? do I have to create jobs each month ?

SMA
  • 15
  • 5

1 Answers1

1

Have you considered Data Profiling? It lets you setup org or project level profiling of ALL your tables so that you don't have to manage the orchestration. It'll give you column level details to identify the likely types found within tables. You can choose the cadence of upgrades based on either table data changing or schema upgrades (or no updates).

enter image description here

Jordanna Chord
  • 950
  • 5
  • 12
  • Adding link to the docs here: https://cloud.google.com/dlp/docs/data-profiles – Scott Ellis Nov 18 '22 at 23:36
  • Thank you. I considered using data profiling, but it implies to scan all the data. And as there is several hundred of terabytes It would represent a huge cost my client is not willing to pay. – SMA Nov 21 '22 at 08:39
  • When you say "scan all the data" are you concerned about wanting to exclude certain tables all together? or more about larger tables being charged the $0.03/GB? – Scott Ellis Nov 21 '22 at 20:27
  • A few things to consider: 1) You can include/exclude tables/datasets by name/regex: https://cloud.google.com/dlp/docs/profile-org-folder#manage-schedules 2) For pricing the "billable bytes per table is equal to the table's size or 3 TB, whichever is lower." So tables larger than 3TB a billed at 3 TB: https://cloud.google.com/dlp/pricing#data_profiling_pricing – Scott Ellis Nov 21 '22 at 20:34
  • Yes the problem is to scan all lines in all tables, if I just use the 0.03/gb * size it would be arround $15000. And my client does not want to pay as much each time: we will have to rescan data to adjust info_types, exclusion_rules, hotwords... I have to check but I don't think there are lots of table larger than 3TB here. – SMA Nov 23 '22 at 15:25
  • If you by chance have a ticket with support, if you file a ticket and reference this post they should be able to connect you to Scott and I to discuss pricing and our roadmap in this space. – Jordanna Chord Dec 08 '22 at 05:20