I'm using pdftk and doing some testing and finding that bursting a multipage PDF file into separate single page PDF files, and then generating an md5 hash checksum (digital fingerprint) for each of those single page PDFs results in a different hash every time I do the burst. This is the result even if it's the exact same file with no changes.
My test process is:
- Decompress test.pdf (a simple text-only PDF that contains 10 pages)
- Using pdftk, burst (split) test.pdf into 10 separate PDF files (1 page per file)
- Generate md5 hash checksum for each of the 10 single-page PDF files
- Record the 10 hash checksums
- Repeat steps 1-4
- Note that all hashes differ
Side note: generating a checksum on the PDF after decompression yields the exact same checksum upon repetition.
I'm using node.js and its crypto module for this exercise.
My question is: Why do the checksums differ upon repetition? I would think that the resulting 10 single-page files are exactly the same as the last time they were created. Their parent document (and thus the individual pages themselves) has not changed at all.