0

In my new job at a community hall in the Netherlands, we work with databases that contain privacy-sensitive data (e.g. citizen service numbers). They also recently started working with Azure, which i'm getting familiar with as we speak. So this might be a beginners question but I hope someone can lead me in the right direction: Is there a way, to retrieve data through a direct connection with a database and make it 'anonymous' for example by hashing or using a key-file of some sorts somewhere in the pipeline? I know that the pipelines are .JSON files and that it's possible to do some transformations. I'm curious about the possibilities for doing this in Azure!

** EDIT **

To be more clear: I want to write a piece of code preferably in the pipeline, that does something like this:

citizen service number person 1
102541220
#generate key/hash somewhere in pipeline of loading in data in azure
anonymous citizen service number, that is specific for person 1
0x10325476

Later, I want to add columns to this database, for example what kind of value the house has this person lives in. I want to be able to 'couple' the databases by using the

anonymous citizen service number 1
0x10325476
Hannie
  • 417
  • 5
  • 17
  • By anonymous you mean encrypted? If yes then maybe SSL connection is sufficient. – hendryanw Mar 20 '18 at 09:07
  • Okay... I get what you mean but I'm not sure if that is what i'm looking for. To be more specific: We don't want the citizen service number to be displayed in the dataset, but a hashed number or key so to say. – Hannie Mar 20 '18 at 11:27
  • 1
    Cool. You're right, it seems you need to use some kind of hashing algorithm. It would be helpful for everyone to know what Azure Services or frameworks which is involved in this pipeline. But generally you want to use something like SHA256 (maybe with some salt) to hash the number in this pipeline and store the hashed value in the new column. – hendryanw Mar 20 '18 at 11:44

1 Answers1

0

It sounds like you'd be interested in Azure SQL Database dynamic data masking.

SQL Database dynamic data masking limits sensitive data exposure by masking it to non-privileged users.

Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.

For example, a service representative at a call center may identify callers by several digits of their credit card number, but those data items should not be fully exposed to the service representative. A masking rule can be defined that masks all but the last four digits of any credit card number in the result set of any query. As another example, an appropriate data mask can be defined to protect personally identifiable information (PII) data, so that a developer can query production environments for troubleshooting purposes without violating compliance regulations.

https://learn.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started

This won't anonymise data irreversibly, in terms of it can be re-personalised by those who have the permissions in SQL server.

It will however allow you to do joins inside of SQL server but not expose the personal data back out.

Alex KeySmith
  • 16,657
  • 11
  • 74
  • 152