0

I'm using python to extract pdfs but the resent pdf gives me unacceptable result like this: unacceptable extraction result

After searching I found that the pdf not allowed to extract: pdf security properties

I tried to print as pdf, but still the same "page extraction: Not allowed", my question is what should I do to extract text in this pdf, there is no password and I can read and print any module in python or other language can do this?

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Congratulations on your first post. I'm afraid Stack Overflow is not the place to ask questions about how to overcome the security features of a software program. – DaveL17 Dec 10 '20 at 12:11
  • thanks, i think it is not a security issue cause the pdf is able to extraction with unknown output i managed to get that every (cid:"number") represent a letter ex :(cid:68) represent a and (cid:36) represent A, (cid:70) represent b and (cid:37) represent B, but i really don't know what is the algorithm or why this happen from the begging!, if you suggest any where i could search tell me – Fatma Abdou Dec 10 '20 at 13:37
  • **(A)** The issue is unrelated to the restrictions you found. The summary you show refers not only to restrictions imposed by the document but also by the viewer, see [this answer](https://stackoverflow.com/a/65021738/1729265) for details. You apparently looked at the summary using Acrobat Reader. If you had used Acrobat Pro, you'd have seen an "Allowed" there. – mkl Dec 10 '20 at 14:45
  • **(B)** The actual problem most likely is that your PDF simply does not contain the information required for text extraction; i.e. it uses character identifiers (cids) which point to drawing instructions for the glyph in a font program, so viewers know how to draw the glyph in question, but it does not indicate how to map from these character ids to Unicode code points. – mkl Dec 10 '20 at 14:47
  • thanks it was helpful to know this, i think i will map these cids to actual character do you suggest any thing else rather than mapping? – Fatma Abdou Dec 11 '20 at 09:36

0 Answers0