2

So we have a ppmd decompression code which was cut and pasted from Dmitry Shkarin's original code from 1997 (based on the few comments in it). The code itself is mostly uncommented, and I just can't find out how it works. The code uses a suballocator, but it doesn't just allocates or deallocates from it, instead it manipulates the free block list directly from the calling code in various ways I can't decipher yet.

We have found a fuzzed sample that causes the code to crash, I was assigned to fix it.

But in order to tackle the problem I need to understand how does the decompression works (only interested in decompression).

Google wasn't very helpful either. Search results are dominated by the results of a gamer with identical nickname, or feature list of various archivers. I eventually found a Russian website where the algorithm specification can be found - only through the Wayback machine and in Russian - which I cannot read due to language barrier.

But it looks like it's only a mathematical description. So far I found nothing about the specification on how does a PPMD compressed data is laid out in compressed file or how it is consumed when decompressing.

Can anyone who understands the PPMD algorithm give me some pointers?

Ideally I'm looking for documents that explains the structure of PPMd encoded data. Something as detailed as the RFC1951 for the deflate.

UPDATE:

Well it turns out the code has quite a few fishy things.

For example this one:

MaxContext=FoundState->Successor;   return;
}
*pText++ = FSymbol;                     Successor = (PPM_CONTEXT*) pText;
if (pText >= UnitsStart)                goto RESTART_MODEL;
if ( FSuccessor ) {
    if ((BYTE*) FSuccessor < UnitsStart)

It writes stuff into a byte buffer, then casts it into a struct that contains pointers.

Then in the CreateSuccessors functions we have another sorcery.

ct.oneState().Successor=(PPM_CONTEXT*) (((BYTE*) UpBranch)+1);

The UpBranch and ct.oneState().Successor are PPM_CONTEXT pointers. I can't imagine what would be the purpose of a statement like this. As I said this structure contains pointers which can be dereferenced eventually (I tried to set these pointers to NULL to see whether they are used). And it turns out they are indeed dereferenced! (at least in the second case).

Calmarius
  • 18,570
  • 18
  • 110
  • 157
  • Is https://www.codeproject.com/Articles/1180/Using-PPMD-for-compression any help? Also, https://stackoverflow.com/questions/3454371/ppmd-compression-in-java gives some hints as to where you might find other implementations to study. – Jim Mischel Jul 25 '18 at 12:54
  • @JimMischel Both of these examples appear to depend on the code I have trouble with. They don't explain the file format either. – Calmarius Jul 25 '18 at 13:47
  • @Calmarius how much code are we talking about, too much to post here or...? – AakashM Jul 25 '18 at 13:59
  • @AakashM The PPMd code our code is based on is here: https://github.com/andyvand/freearc_src/tree/master/Compression/PPMD – Calmarius Jul 25 '18 at 14:20
  • 1
    Wow, good luck with that. [Matt Mahoney](http://mattmahoney.net/dc/dce.html) might be able to help you out. Also that tutorial will give you a start on understanding how ppmd works, but how to decompress is very likely only determinable from the source code itself. – Mark Adler Jul 25 '18 at 17:47

0 Answers0