I've got a bunch of files that from metadata I can tell are supposed to be PDFs. Some of them are indeed complete PDFs. Some of them appear to be the first part of a PDF file, though they lack the %%EOF
and other footer values.
Others appear to be the last part of PDF files (they don't have any of a PDF's headers but they do have the %%EOF
stuff). Curiously they start with the following 16-byte magic header:
0x50, 0x4B, 0x57, 0x41, 0x52, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x57, 0x49, 0x4E, 0x33, 0x32
(PKWARE WIN32
).
I'm doing a lot of inference which could possibly be misleading, but it doesn't seem to be a compression scheme (the %%EOF
stuff is plaintext) and in the few files I've been allowed to look at deeply there's a correlation between starting with this magic and looking like the final segment of a PDF binary.
Does anyone have any hints as to what file format might be at play here?
Update: I've now observed this PKWARE WIN32
happening on non-PDF files as well. Speculation also suggests that these files are split up in a similar manner.
Update 2: It turns out this PKWARE WIN32
header actually occurs in repeating intervals, the location of which can be predicted by some bytes immediately prior to the header.
I've also received some circumstantial hearsay which suggests that these files are compressed and not split into multiple parts, though in 2 out of the 3 cases where I was told the output file sizes my binaries were only negligibly smaller.
The mystery continues.