0

I have a project that involves free text user input (strings of less than 80 characters) and I need to detect PII within that string. This all needs to happen in real-time as we need to send a response to the user input (within 2 seconds or so) which is partially based on whether or not PII is in the text.

I already have found some solutions but they are not quite what I'm looking for:
- Google DLP - requests take over two seconds to process string so cannot be used.
- redact-pii (npm module) - too simplistic in their detections
- AWS Macie - runs on existing datastores and not in-flight data.

Do you have any suggestions for services or libraries that can help with this?

Specific PII we want to detect involves things such as name, address, phone number. Also SPII such as credit card number, social insurance number. Essentially we want to be compliant, in our handling of free-text, with standards such as PIPEDA and GDPR.

Andrew Xia
  • 365
  • 1
  • 11
  • While this question is also a bit too broad, it's also asking to recommend a software tool. Unfortunately [this type of question isn't really appropriate](https://meta.stackoverflow.com/a/251135/2486496) for the site. Going back to the "too broad," aspect: Have you considered writing a ground-up in-house tool/utility specifically to catch the type of PII you're looking to trap? i.e.: Credit card numbers, or names/addresses? *What type of data are you looking to prevent flowing?* – gravity Aug 24 '18 at 18:22
  • Thanks for your response! Added type examples to the main post. We have considered building an in-house plugin, however we would prefer to use a third-party already existing tool if possible/exists – Andrew Xia Aug 24 '18 at 18:30
  • Very valid question. The Google DLP API is so slow ... if someone can explain how to make it faster maybe. – TalBeno Aug 24 '18 at 19:45

1 Answers1

1

How about the recently launched Amazon SNS message data protection feature? It can detect and protect PII and PHI in motion, in real time, without custom code.

https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-sns-preview-message-data-protection-sensitive-data-in-motion/

This feature supports the data identifiers that you’re looking to scan for, including name, address, phone number, credit card number, and social insurance number.

Otavio Ferreira
  • 755
  • 6
  • 11
  • And, as of November 2022, you can also have real-time PII data redaction and masking, via SNS topics. You can read more about the GA release here: https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-sns-message-data-protection-available-real-time-data-redaction-masking/ – Otavio Ferreira Mar 02 '23 at 23:29