3

I'm currently reviewing a database design that in order to deal with the removal of user records, in order to deal with requirements such as DPA and EU GDPR Right to be Forgotten, is proposing not to enforce referential integrity between the user record and 'related' tables, such as Transaction, Communication Event, etc., so that the user record can be deleted when requested but records in related tables (that use a non-identifying key/sequence number) will remain intact.

So, before I push back on this and open up the 'discussion' that will follow, I wondered whether anyone thought it was ever acceptable to remove, or do without, referential integrity in cases such as this, or should other methods be used - such as masking the user details, or changing the user record to a placeholder record to show that the transaction relates to a redacted user.

All thoughts welcome...

Si Downes
  • 51
  • 4

2 Answers2

5

This is a complicated topic that goes beyond referential integrity constraints.

My understanding of the EU privacy restrictions (and I stress that I am not a lawyer) is that they relate to personally identifiable information, not to business related "anonymous" relationships. For instance, I think you can still count a removed user as "active" for the period when they were active; you just can't know who they are.

My approach would be to put all PII data into a single table/database. When a user wishes to be forgotten, I would update the record to remove the PII. All the foreign key relationships are then fine. You are just missing the name, address, email address, and whatever else is deemed PII.

Just identifying the PII is very tricky, because email addresses and user names and so on can be embedded in the most unusual places (URLs are one obvious place to look but there can be others).

I don't recommend actually removing all traces of the person from all databases. You will then be in a situation where your reports no longer balance . . . Oh, our reports said we had 1,000,000 customers then, but we can only find 999,900 of them. Let's waste a bunch of people's efforts to figure out what happened.

My suggestion: Be careful. This is a long process and set expectations in your organization accordingly.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I'm not familiar with that law but if it's possible, I would want to go with a flag in the Users table that signifies "forget this user." That way, the user does not appear in normal processing but can still be available for certain in-house processing such as, oh, complying with a subpoena and such. – TommCatt Oct 10 '16 at 04:49
  • @TommCatt . . . Unfortunately, that is definitely not sufficient for the EU laws. – Gordon Linoff Oct 10 '16 at 23:48
2

Please have a look at retention laws for your industry. People have the right to be forgotten, but businesses also have a legal obligation to retain certain records for a period of time.

At this point it's unclear to me which regulation overrules another so my advice is you bring in a legal expert that will be able to clear up this matter.

From a technical perspective, your application might require business data related to private data, so a good approach is to flag the records as forgotten and replace private data with generated data. This way, your application keeps behaving the same way, but the private information is gone.

This is a simple approach that can be applied on many legacy applications, even automated as a process.

The only thing you must watch out for are the backups taken as your changes might be reverted if data has to be restored from a backup. Keep a separate table with keys pointing to records require to be forgotten so if a backup is overwriting latest changes, you can use your automation script again to remove those wha want to be forgotten.

DragonBe
  • 426
  • 2
  • 6