Context: we have telemetry system for our service and would like to track retention, how many users use various features, etc.
There are two options to deal with user identifiable information and be GDPR compliant:
- Support deleting user information based on request
- Keep data for less than 30 days
Option #1 is hard to implement (for telemetry system). Option #2 doesn't allow answering questions such as "what is 6-month retention for feature X?".
One idea how to get answers for above question is to calculate HyperLogLog blobs per feature every week/day and store them separately forever. This will allow moving forward to merge/dcount/calculate retention based on these blobs.
Assuming that any user identifiable information is gone after 30 days (after user account gets deleted), will HyperLogLog blobs still allow to track users or not (i.e. to answer whether a particular user used feature X two years ago)?
If it allows then it is not compliant (doesn't mean that it is compliant if it doesn't allow).