Getting corrupted PDF file after reading and rewriting it to a new file

Asked Mar 20 '19 at 11:45

Active Mar 20 '19 at 11:45

Viewed 281 times

I am exploring PDF file format and trying to edit and manipulate its internal data. the problem is that I noticed I always get corrupted files after making any minor change to a file so I tried a very simple example to just read the pdf data and rewrite it to a new "file.pdf" without making any changes, as follows:

file = open('sample.pdf','r',encoding='ansi').read()
file_ = open('output.pdf','w').write(file)

but again I got a corrupted file (can't be opened using Adode reader) so I tried to open it using Google Chrome and it worked properly but with the font has changed to the default instead of the original font file.

I opened the input and output files and compared them using notebad++ and two files matched exactly!

I also opened the output file and copied its content and pasted it to the input file and surprisingly, it worked well, exactly as the input file.

Any ideas what is the problem?

asked Mar 20 '19 at 11:45

Ahmed Hawary

2

PDFs are binary files and you're treating them as text files, so naturally information is going to be lost. – Ian Kemp Mar 20 '19 at 11:59
This looks very much like [this question](https://stackoverflow.com/q/55261941/1729265)... – mkl Mar 20 '19 at 14:58

Getting corrupted PDF file after reading and rewriting it to a new file

0 Answers0

Linked