3

I need to decompress some zlib compressed files found within a game's save data. I have no access to the game's source. Each file begins with 0x789C which tells me that they are indeed compressed with zlib. However, all calls to inflate on these files fail to decompress fully and return Z_DATA_ERROR. Using zlib version 1.2.5, 1.2.8, and 1.2.11 with identical results.

Even though zlib is telling me the input data is corrupt, I'm confident that it is not since the game is able to decompress these files with no issues AND this is not isolated to a single data stream. I have hundreds of thousands of unique data streams compressed the same way and they all throw a Z_DATA_ERROR somewhere in the middle of the decompression.

Furthermore, the partially decompressed data that IS successfully decompressed, is correct. The output is exactly as expected.

Also, about 10% of the time, zlib WILL decompress the entire file, however the result is not correct. Large chunks of the decompressed data contain the same byte repeated over and over, which tells me it was a false positive.

Here's my decompression code:

//QByteArray is a Qt wrapper for a char *
QByteArray Compression::DecompressData(QByteArray data)
{
    QByteArray result;

    int ret;
    z_stream strm;
    static const int CHUNK_SIZE = 1;//set to 1 just for debugging
    char out[CHUNK_SIZE];

    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    strm.avail_in = data.size();
    strm.next_in = (Bytef*)(data.data());

    ret = inflateInit2(&strm, -15);
    if (ret != Z_OK)
    {
        qDebug() << "init error" << ret;
        return QByteArray();
    }

    do
    {
        strm.avail_out = CHUNK_SIZE;
        strm.next_out = (Bytef*)(out);

        ret = inflate(&strm, Z_NO_FLUSH);
        qDebug() << "debugging output: " << ret << QString::number(strm.total_in, 16);//This tells me which input byte caused the failure
        Q_ASSERT(ret != Z_STREAM_ERROR);

        switch (ret)
        {
        case Z_NEED_DICT:
            ret = Z_DATA_ERROR;
        case Z_DATA_ERROR:
        case Z_MEM_ERROR:
            (void)inflateEnd(&strm);
            return result;
        }

        result.append(out, CHUNK_SIZE - strm.avail_out);
    } while (strm.avail_out == 0);

    inflateEnd(&strm);
    return result;
}

Here is a pastebin of an example file's data compressed data with the 0x789C and trailing CRC removed. I can supply literally endless example files. All of them have the same issue.

Running that data through the above function will decompress the beginning of the stream correctly, but fail on input byte 0x18C. You can tell it decompressed correctly when the start of the file begins with 0x000B and the decompressed data is longer than the input data.

I wish I knew more about deflate compression to solve this problem myself. My initial thoughts are that the game has decided to use a custom version of zlib or an extra parameter needs to be given to zlib in order to decompress it correctly. I've asked around and tried many things for days, and I really need someone with knowledge on the subject to weigh in here. Thanks for your time!

mrg95
  • 2,371
  • 11
  • 46
  • 89
  • If you really want to get to the bottom of this, it might be helpful to provide a larger sampling of savegames, and/or identify the game so other people can produce their own. – mwfearnley Feb 27 '19 at 22:26
  • @mwfearnley there are thousands of these compressed files in this single savegame – mrg95 Feb 28 '19 at 23:02
  • Oh, OK.. but it's just the one sample stream you've pasted, right? Possibly looking at multiple streams would make it possible to find a consistent way that the data is being mangled... Also, I was wondering how you're able to verify the correctness of the partial data? – mwfearnley Mar 01 '19 at 10:31
  • I'm able to verify the correctness because I know what to expect in the output. I'm very experienced regarding the save format of this specific game, however it's just this one edition of the game that has different compression. I've attempted to decompress all of the stream in batch. A very very small percentage of them do decompress with no errors, but the data is partially wrong still. I may post all the streams if that would help – mrg95 Mar 02 '19 at 11:04

1 Answers1

2

The provided data is indeed an invalid deflate stream, both with distances too far back, and eight bytes of junk after the deflate stream has ended. There is nothing apparent wrong with your code.

As you noted, at offset 396 there is the first of ten distances too far back. That's where inflate stops. At offset 3472, almost at the end, there is a stored block with a length that doesn't check against its complement.

For the distances too far, you could try setting a dictionary of 32K zero bytes using inflateSetDictionary() right after inflateInit2(). Then the decompression would proceed, filling in the given locations with zeros. That may or may not be what the game is doing. There is no obvious remedy for the stored-block error.

Indeed the game author's may be deliberately messing with you or anyone trying to decompress their internal data, by having modified zlib for their own use.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks Mark! Yes I meant offset. I find it interesting that you got beyond byte 396. All other software I've tried this on as well hasn't made it past 396. I'm curious to know if it's also dependent on something else that's in memory at any given time? Interestingly... in the cases when zlib finished but the output has large chunks of repeated data in spots where there shouldnt be, it's always in the same location. Definitely an indicator to me that it's not just scrambled, but instead a custom compression thing. Can you tell what customization they used? Or is it more complex than that? – mrg95 May 13 '18 at 20:14
  • Can you send me the decompressed output you have so I can check it against my expectations? – mrg95 May 13 '18 at 20:47
  • My bad -- I missed an earlier error message from my tool. It will stop at 396. See updated answer. – Mark Adler May 13 '18 at 22:24
  • Thanks for the valuable information! You sir are a legend :) – mrg95 May 13 '18 at 22:45
  • The 'last block' bit flag is set on the first block (the first octet in the source is an odd number), so there's no block after it - the last 7 bytes are not part of the stream, although the `0f 98 79 17` might be a checksum.. Seems like the only problem is the reads that are too far back. If the dictionary isn't filled with zeros, you might be able to reconstruct it by comparing those sections with the correct output. – mwfearnley Mar 02 '19 at 15:09
  • @mwfearnley Thanks, corrected. There are eight bytes after the end of the deflate stream. – Mark Adler Mar 02 '19 at 16:12
  • @MarkAdler you're right, it's 8 bytes - I forgot to discount the need for a second 3-bit block header. – mwfearnley Mar 02 '19 at 16:29