1

For the EU's GDPR compliance (user privacy), we need to redact personally identifiable information form the versions of our records. I've come up with something that seems to work, but figure I should ask if there's an established way to do this.

class User < ActiveRecord::Base
  has_paper_trail
end

user = User.create! name: 'Josh'
user.update_attributes name: 'Josh2'
user.update_attributes name: 'Josh3'
user.destroy!

def self.get_data
  PaperTrail::Version.order(:id).where(item_id: 1).map { |ver| [ver.event, ver.object, ver.object_changes] }
end

# =====  BEFORE  =====
get_data
# => [["create", nil, {"id"=>[nil, 1], "name"=>[nil, "Josh"]}],
#     ["update", {"id"=>1, "name"=>"Josh"}, {"name"=>["Josh", "Josh2"]}],
#     ["update", {"id"=>1, "name"=>"Josh2"}, {"name"=>["Josh2", "Josh3"]}],
#     ["destroy", {"id"=>1, "name"=>"Josh3"}, nil]]

PaperTrail::Version.where_object_changes(name: 'Josh').each do |ver|
  ver.object['name'] = 'REDACTED' if ver.object && ver.object['name'] == 'Josh'
  if oc = ver.object_changes
    oc['name'] = oc['name'].map { |name| name == 'Josh' ? 'REDACTED' : name }
    ver.object_changes = oc
  end
  ver.save!
end

# =====  AFTER  =====
get_data
# => [["create", nil, {"id"=>[nil, 1], "name"=>[nil, "REDACTED"]}],
#     ["update",
#      {"id"=>1, "name"=>"REDACTED"},
#      {"name"=>["REDACTED", "Josh2"]}],
#     ["update", {"id"=>1, "name"=>"Josh2"}, {"name"=>["Josh2", "Josh3"]}],
#     ["destroy", {"id"=>1, "name"=>"Josh3"}, nil]]

UPDATE: Actually, I'm going to need to scope the record by an association, as well, so my example isn't sufficient.

Joshua Cheek
  • 30,436
  • 16
  • 74
  • 83
  • I'm voting to close this question as off-topic because Stack Overflow is not a legal authority on how to handle GDPR compliance. – tadman May 22 '18 at 17:10
  • 1
    Do you need to track that there was a change? Papertrail has the `:ignore` and `:only` options to only watch for changes on certain attributes. – mr rogers May 22 '18 at 17:11
  • Why are you only redacting `Josh`? If I change my name to `Joshua` I would expect that too to be redacted as it still personally identifies me in the `update`. Plus it would be easier to target known keys to redact rather than specific key value pairs – engineersmnky May 22 '18 at 17:56
  • 2
    @tadman not asking for legal advice, just how to redact information in PaperTrail. – Joshua Cheek May 22 '18 at 18:23
  • 1
    @mrrogers I'm not sure I understand the question. We track all changes to a subset of fields, and I need to redact specific PII from those historical records. We use those features only for reducing irrelevant info which gets spammy. Until a piece of data is redacted, we expect full normal functionality. – Joshua Cheek May 22 '18 at 18:27
  • @engineersmnky In that situation, we would redact two pieces of information. Note that this example is merely intended to explore how to redact info, not to be representative of the actual data I need to redact. It's a reasonable consideration, but if we do decide to do something along those lines, it will be a business decision, based on requirements that emerge in practice and conversations between the business team and the legal team. – Joshua Cheek May 22 '18 at 18:30
  • my point was your existing "use case" shows we are redacting the name "Josh" but when the name changes we do not redact the change. This does not make logical sense to me. Do you know the key fields (hash keys in this case) you want to redact or are the redactions arbitrary as shown? What are the actual rules for this because they are unclear right now? Also what ruby version are you using as it does impact how on answers updating values in a `Hash`? – engineersmnky May 22 '18 at 18:43
  • @engineersmnky within the context of my example, there should be no way to go from the name `"Josh"` to any data associated with Josh. While my example changes `"Josh"` to `"Josh2"`, these are to be considered different names. In my real example, its based on email addresses, and the requirements are more nuanced. Do you know if this is the way to redact information from PaperTrail's versions table? Or, more generally, perhaps: is this the way to modify versioned history? – Joshua Cheek May 22 '18 at 18:52
  • You can probably get an answer for that narrow problem, but for GDPR in general, you'll need a professional assessment. For anything involving potential fines I'm just trying to be careful here. – tadman May 22 '18 at 22:52

1 Answers1

0

For the EU's GDPR compliance (user privacy), we need to redact personally identifiable information form the versions of our records. I've come up with something that seems to work, but figure I should ask if there's an established way to do this.

No, as of today, 2018-05-30, there is no built-in feature or documented solution for GDPR redaction.

PaperTrail provides many ways to iterate over, and to query records in the versions table. where_object_changes is one such feature, but it generates some pretty complicated SQL.

where_object_changes(name: 'Joan')

SELECT "versions".*
FROM "versions"
WHERE .. ("versions"."object_changes" LIKE '%
name:
- Joan
%' OR "versions"."object_changes" LIKE '%
name:
-%
- Joan
%')

You may, justifiably, have concerns about the correctness of this query. In fact, as of PT 9.0.0, using where_object_changes to read YAML from a text column raises an error to that effect. Reading JSON from text or from a json/b column is still allowed.

Anyway, if I've succeeded in making you wary of such complicated SQL then you should choose a simpler approach, perhaps iterating over all of the version records for that user (user.versions.find_each)

Jared Beck
  • 16,796
  • 9
  • 72
  • 97