If there are confidential document on which organization like to leverage LLM (e.g. OpenAI CHATGPT4) but just as precaution if they would like to abstract confidential information automatically then is it possible using langchain API (without loosing much of context). e.g. if there is name of company then it will just replace with "Company A" I am looking for option which are available as generic method like embedding which understands semantic meaning of words.
Asked
Active
Viewed 250 times
0
-
Not sure to understand your question fully. But You may be looking for anonymization or even better, homomorphic encryption. – Itération 122442 May 11 '23 at 19:01
-
thats correct. More integrated with lanchain kind of API and understand semantic meaning to remove it before sending it to LLM like ChatGPT and then attaching back to response so it will be seamless for the end user but allow organization to protect providing too much information to LLM – R007 May 11 '23 at 19:03
1 Answers
1
Looks like you need a redaction function before sending the data to chatgpt. There are AWS and Azure APIs that do PII redaction.
https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using-amazon-comprehend/
To redact info that isn't PII there is NER (Named Entity Recognition) services available.
https://huggingface.co/dslim/bert-base-NER

Gary Bake
- 171
- 7