1

I am working on a PDF acrobat add-on product and one of the requirements is to extract the text marked for redaction in a given PDF document.

Assuming you know what is "redaction" ( Please read this if you don't http://acrobatusers.com/tutorials/redacting-pdf-files-survey-tools ), please suggest how can I discover the co-ordinates for the text which has been "marked" for redaction in any PDF and then extract the exact text.

Please ask for more details if you believe you can lead me to the correct answers. I have tried using iTextSharp and Aspose.PDF libraries for the same without much success.

Andrew
  • 13,757
  • 13
  • 66
  • 84
Bathla
  • 125
  • 9
  • It's not cheap, but PDFLib's TET extension is the most reliable text extraction and manipulation utility I have found: http://www.pdflib.com/products/tet/how-to-use-tet/ I've worked with iTextSharp, Aspose, TallPDF, DynamicPdf, and AbcPdf at various times, but none does as good a job of extraction which, as I am sure you have discovered, can be a bit tricky. I don't have any specific experience finding redacted text, but the TET documentation is comprehensive. – Jude Fisher Aug 24 '12 at 10:35
  • 1
    Adobe documents redaction tools as something to "permanently delete confidential data". I would be very surprised if you could extract redacted data from a PDF. – Peter Ritchie Aug 24 '12 at 15:48
  • Peter! My questions is not how to extract redacted data from a PDF. I understand I simply can not extract that which has been permanently erased from a PDF. My question is how do I extract text that has been "MARKED" for redaction ( there is a difference!) – Bathla Aug 26 '12 at 15:58

1 Answers1

3

When you mark text for redaction with Acrobat, it creates redaction annotations. The redaction annotations have the /Subtype key set to /Redact. The redaction area is defined by the /QuadPoints key in annotation dictionary. I do not know if iTextSharp or Aspose support redaction annotations. With iTextSharp you can use the COS API to retrieve the raw PDF objects and inspect the objects you need.

iPDFdev
  • 5,229
  • 2
  • 17
  • 18