Currently we have a webservice called by clients to get a pdf file. The webservice goes out to another system to fetch that file, returned in hex format. Our webservice then converts the Hex string to bytes and then responds back to the clients with the pdf file, as below:
Byte[] pdfBtyes = ConvertHexStringToBytes(hexValueFromOtherSystem);
HttpResponse _response = Context.Response;
_response.Clear();
_response.ContentType = "application/pdf";
_response.AppendHeader("Content-Disposition", "inline;filename=" + FileName + ".pdf");
_response.BufferOutput = true;
_response.AddHeader("Content-Length", pdfBtyes.Length.ToString());
_response.BinaryWrite(pdfBtyes);
_response.End();
BUT NOW we need to find a way to mask or hide or redact or totally remove some content from the file, basically some sensitive information based on user type, before sending it in response to clients. And this has to be in real-time, like before when we had no masking requirement.
The hiding / masking / removing / redacting, for now is assumed to be based on a specific positional area of the file. Like identify a Left-Top, Right-Top, Right-Bottom, Left-Bottom rectangular region. Another thing is, there could be more than 1 such rectangular regions.
Or if PDFs have any concept of a Line Numbers, we could use that if feasible, like mask Line# 5,6,7,8,9 of the file, from Left-0 to Right-n.
Q1 - The core question is how this could be achievable using any open source & free libraries, APIs, SDKs? Or Custom development from scratch? Or must go for a paid option like PDFTron (which seems like only meant for windows applications) https://www.pdftron.com/documentation/samples/cs/PDFRedactTest?platforms=dotnet.
OR SyncFusion https://www.syncfusion.com/blogs/post/easy-ways-to-redact-pdfs-using-c.aspx
Q2 - The second & in fact more crucial question on my mind is, is it even a good idea to do this in real-time? Can it be done in a time frame like within 500 ms?