LangChain python - ability to abstract chunk of confidential text before submitting to LLM

Question

If there are confidential document on which organization like to leverage LLM (e.g. OpenAI CHATGPT4) but just as precaution if they would like to abstract confidential information automatically then is it possible using langchain API (without loosing much of context). e.g. if there is name of company then it will just replace with "Company A" I am looking for option which are available as generic method like embedding which understands semantic meaning of words.

Not sure to understand your question fully. But You may be looking for anonymization or even better, homomorphic encryption. — Itération 122442, May 11 '23 at 19:01
thats correct. More integrated with lanchain kind of API and understand semantic meaning to remove it before sending it to LLM like ChatGPT and then attaching back to response so it will be seamless for the end user but allow organization to protect providing too much information to LLM — R007, May 11 '23 at 19:03

score 1 · Accepted Answer · answered Jul 25 '23 at 10:23

Looks like you need a redaction function before sending the data to chatgpt. There are AWS and Azure APIs that do PII redaction.

https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using-amazon-comprehend/

https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/how-to-call

To redact info that isn't PII there is NER (Named Entity Recognition) services available.
https://huggingface.co/dslim/bert-base-NER

LangChain python - ability to abstract chunk of confidential text before submitting to LLM

1 Answers1