I wanted to extract PDF content using Apache Tika Library. All is good until I encountered PDF with encrypted username and password. It hits errors as below:
INFO Document is encrypted org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
Caused by: org.apache.pdfbox.exceptions.CryptographyException: Cannot find an appropriate security handler for Adobe.APS at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:952) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:139) ... 4 more
Does anyone knows if Apache Tika supports extraction of PDF with such security feature?