4

I wanted to extract PDF content using Apache Tika Library. All is good until I encountered PDF with encrypted username and password. It hits errors as below:

INFO Document is encrypted org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)

Caused by: org.apache.pdfbox.exceptions.CryptographyException: Cannot find an appropriate security handler for Adobe.APS at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:952) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:139) ... 4 more

Does anyone knows if Apache Tika supports extraction of PDF with such security feature?

fattysxx
  • 41
  • 2

1 Answers1

0

You can try it below. It worked for me

PasswordProvider pp = (metadata) -> "password";

    // Create a context parser for the pdf document
    ParseContext context = new ParseContext();
    context.set(PasswordProvider.class, pp);