C# Mask or Hide or Remove or Redact certain areas in pdf file

Question

Currently we have a webservice called by clients to get a pdf file. The webservice goes out to another system to fetch that file, returned in hex format. Our webservice then converts the Hex string to bytes and then responds back to the clients with the pdf file, as below:

Byte[] pdfBtyes = ConvertHexStringToBytes(hexValueFromOtherSystem);

HttpResponse _response = Context.Response;
_response.Clear();
_response.ContentType = "application/pdf";
_response.AppendHeader("Content-Disposition", "inline;filename=" + FileName + ".pdf");
_response.BufferOutput = true;
_response.AddHeader("Content-Length", pdfBtyes.Length.ToString());
_response.BinaryWrite(pdfBtyes);
_response.End();

BUT NOW we need to find a way to mask or hide or redact or totally remove some content from the file, basically some sensitive information based on user type, before sending it in response to clients. And this has to be in real-time, like before when we had no masking requirement.

The hiding / masking / removing / redacting, for now is assumed to be based on a specific positional area of the file. Like identify a Left-Top, Right-Top, Right-Bottom, Left-Bottom rectangular region. Another thing is, there could be more than 1 such rectangular regions.

Or if PDFs have any concept of a Line Numbers, we could use that if feasible, like mask Line# 5,6,7,8,9 of the file, from Left-0 to Right-n.

Q1 - The core question is how this could be achievable using any open source & free libraries, APIs, SDKs? Or Custom development from scratch? Or must go for a paid option like PDFTron (which seems like only meant for windows applications) https://www.pdftron.com/documentation/samples/cs/PDFRedactTest?platforms=dotnet.

OR SyncFusion https://www.syncfusion.com/blogs/post/easy-ways-to-redact-pdfs-using-c.aspx

Q2 - The second & in fact more crucial question on my mind is, is it even a good idea to do this in real-time? Can it be done in a time frame like within 500 ms?

*"Can it be done in a time frame like within 500 ms?"* - on one hand that depends on your server resources and the number of requests processed in parallel. On the other hand obviously the complexity if the pdfs matters. — mkl, Jun 12 '20 at 13:29

C# Mask or Hide or Remove or Redact certain areas in pdf file

0 Answers0