-1

I'm having a hard time anonymizing PII for a project I am working on using Presidio. For example, when I am trying to clean the data and I give in an address (i.e 123 Sesame Street, Los Angeles, California) it will give me back

123 Sesame Street, <LOCATION>, <LOCATION>.

While this is a step in the right direction, how can I get it to also anonymize 123 Sesame Street?

I tried adding context clues like "I live at", "My address is", and "is my address" in hoping it would take a closer look at the following or prior text. It did not help.

Code:

from presidio_anonymizer import AnonymizerEngine
from presidio_analyzer import AnalyzerEngine

analyzer_engine = AnalyzerEngine()
anonymizer_engine = AnonymizerEngine()

pii_context_clues = ['name', 'phone', 'address is', 'my address', 'live at', 'live in']
text = 'My address is 123 Sesame Street, Los Angeles, California'

analysis_results = analyzer_engine.analyze(text=text, language='en', context=pii_context_clues)
redacted_text = anonymizer_engine.anonymize(text, analysis_results)
print(redacted_text)

Output

text: My address is 123 Sesame Street, <LOCATION>, <LOCATION>

items:
[
{'start': 45, 'end': 55, 'entity_type': 'LOCATION', 'text': '<LOCATION>', 'operator': 'replace'},
{'start': 33, 'end': 43, 'entity_type': 'LOCATION', 'text': '<LOCATION>', 'operator': 'replace'}
]

Desired Output:

text : My address is <LOCATION>, <LOCATION>, <LOCATION>
  • Can you provide your tried code, inputs, and their expected outputs by [editing](https://stackoverflow.com/posts/76963492/edit) your post? – shaik moeed Aug 23 '23 at 17:22
  • Welcome to Stackoverflow! Asking for recommendations might not be appropriate on the Stackoverflow (https://stackoverflow.com/help/how-to-ask) but it might be possible to ask the question on https://softwarerecs.stackexchange.com Also, logging it on https://stackoverflow.com/collectives/nlp/beta/discussions/76949597 – alvas Aug 25 '23 at 16:25

0 Answers0