-1

I am using pdfparser to parse and read text from pdfs on PHP. It works fine for some pdf files. But, for some files it throws an error saying: 'Secured pdf file are currently not supported.'. When I try to open the files that pdfparser says are secure, using a pdf reader like Adobe, I can open them without any problem.

I have tried few approaches like using file_get_contents and file_put_contents to save the files again to check if they might work, but to no avail. Is there any solution to parse and read the text from these files? Any solution is greatly appreciated.

SAN
  • 75
  • 2
  • 11
  • *Secured pdf file are currently not supported* perhaps there is a reason why they don't want you to be able to use the content for other things? – Nigel Ren Jun 08 '20 at 06:45
  • You could try and find a PDF parser that supports secured PDF's? – M. Eriksson Jun 08 '20 at 06:45
  • @NigelRen That's a fair point. But, I have checked the permissions for those files, and all them allow copying content, which is what I wanted to do with pdfparser: extract content. – SAN Jun 08 '20 at 13:41
  • @MagnusEriksson I've been trying to find a good alternative. Please let me know if you know any good alternatives. – SAN Jun 08 '20 at 13:43

1 Answers1

1

A file can be encrypted, but have a default user password. This allows you to open the PDF file but (with conforming software) does not allow you to change the permissions. A separate owner password is required to change the permissions.

So it's entirely possible to have a PDF file which is secured with an owner password but has no user password, which a PDF consumer which supports encrypted files can open, but is still nevertheless encrypted and cannot be opened by a consumer which does not support encryption.

As Magnus suggested above you could use a different PDF consumer, or you could contribute encrypted PDF support to pdfparser.

halfer
  • 19,824
  • 17
  • 99
  • 186
KenS
  • 30,202
  • 3
  • 34
  • 51
  • Thank you for your answer, I am not proficient enough in pdf parsing to contribute though! If you know any other good parser, I'd be happy to know. – SAN Jun 08 '20 at 14:05
  • pspdfkit, muPDF, Ghostscript, itext, I'm sure there are others. If all you wnt to do is extract text any of those should be able to do so. – KenS Jun 08 '20 at 14:09
  • Yes, extracting text is all I wanted to do, as the permissions on the pdf files I am working on do allow content copying. Thank you for suggesting other parsers, I'll try to see which one is useful for me. – SAN Jun 08 '20 at 14:25
  • 1
    @SAN if you want to stay at PHP, you may check out our [SetaPDF-Extractor](https://setasign.com/extractor) (not free). It allows you to access secured/encrypted files, too. – Jan Slabon Jun 08 '20 at 17:32