1

So .net does not have a ZlibStream so I am trying to implement my own one using DeflateStream which .net does have. DeflateStream also apparently does not support using Dictionaries so I skip that in my ZlibStream as well.

Writing works well but I have a problem with my Read method.
Here is my Read method:

public override int Read(byte[] buffer, int offset, int count)
{
    EnsureDecompressionMode();
    if (!_readHeader)
    {
        // read the header (CMF|FLG|optional DIC)
        _readHeader = true;
    }

    var res = _deflate.Read(buffer, offset, count);
    if (res == 0) // EOF
    {
        // read adler32 checksum
        BaseStream.ReadFully(_scratch, 0, 4);
        var checksum = (uint)_scratch[0] << 24 |
                       (uint)_scratch[1] << 16 |
                       (uint)_scratch[2] << 8 |
                       (uint)_scratch[3];
        if (checksum != _adler32.Checksum)
        {
            throw new ZlibException("Invalid checksum");
        }
    }
    else
    {
        _adler32.CalculateChecksum(buffer, offset, res);
    }
    return res;
}

Where:

  • _scratch is a byte[4] used as a temporary buffer
  • _deflate is a DeflateStream.

Zlib's format is CMF|FLG|optional DICT|compressed data|adler32|. So I need a way to stop reading when the adler32 is reached. Initially, I thought DeflateStream would return EOF when it's done but it turns out it reads till EOF of the underlying stream. So it also reads the adler32 as if it's compressed data. So when I try to read adler32 from BaseStream inside the if block, an EOF exception is thrown.

So how do I make DeflateStream stop reading the adler32 as if it's compressed data and instead EOF there or do something equivalent, so that I can read adler32 from the BaseStream without compression?

wingerse
  • 3,670
  • 1
  • 29
  • 61
  • 2
    If what you're saying is true, that it doesn't stop at the end of the deflate stream, then the .NET `DeflateStream` decompression implementation is brain dead and useless. Based on Microsoft's past history with these classes, I would _completely_ believe that. You should try [DotNetZip](https://www.nuget.org/packages/DotNetZip/) instead. – Mark Adler Sep 16 '17 at 16:43
  • Thanks Mr.Adler :). Unfortunately DotNetZip does not support .net core yet. I guess I will have to use memorystreams and let DeflateStream read until length - 4. – wingerse Sep 16 '17 at 17:02
  • Plus I just noticed that `DeflateStream` doesn't seem to care if the checksum is not matching, it just returns without an error. – Ray Dec 23 '17 at 20:24

2 Answers2

0

Since files have a fixed size can't you simply stop at base.Length - typeof(int)? Adjust the read-buffer if necessary and then read the uncompressed checksum.

Someting like:

public override int Read(byte[] buffer, int offset, int count)
{
    // read header...

    int res = -1;
    if (base.Position + count - offset > base.Length)
    {
        // EOF, skip the last four bytes (adler32) and read them without decompressing
        res = _deflate.Read(buffer, offset, count - sizeof(int));
    }
    else
    {
        res = _deflate.Read(buffer, offset, count);
    }

    // continue processing the data
}

not tested

Michael
  • 1,931
  • 2
  • 8
  • 22
  • I intend to use ZlibStream with all kinds of streams including NetworkStream which doesn't support `Length` – wingerse Sep 15 '17 at 12:16
  • @WingerSendon I really dont have any experience with data compression but according to wikipedia (curiosity :p) each compressed data block has a leading bit that can mark this block as the last block (bit=1) or that there will other blocks following (bit=0). If you know its the last block you should know when to stop reading compressed data and start reading raw data... – Michael Sep 15 '17 at 12:30
  • Unfortunately .net's `DeflateStream` does not expose any such thing.. Does this mean that I need to implement my own DeflateStream as well? D: – wingerse Sep 15 '17 at 12:36
  • @WingerSendon According to the reference sources of http://referencesource.microsoft.com/#System/sys/System/IO/compression/DeflateStream.cs and http://referencesource.microsoft.com/#System/sys/System/IO/compression/Inflater.cs the Inflater class has built in detection of the trailing checksum. Have a look at InflaterState.StartReadingFooter in the Inflater class... – Michael Sep 15 '17 at 12:47
  • Inflater is internal, GzipStream also uses internal methods, so I can't even copy GzipStream's approach. – wingerse Sep 15 '17 at 13:00
  • not only that, I also couldn't find a library supporting zlib on Nuget which supports .net core – wingerse Sep 15 '17 at 13:03
  • @WingerSendon You could use reflection to get the internals - but since they are internals they could be changed or removed everytime... – Michael Sep 15 '17 at 15:24
  • That's just hacky, I would rather not Implement a zlib stream but use a combination of memory stream and writing header and footer myself. It would work fine in my case where I'm reading packets from streams into buffers anyway.. Thanks for the help :) – wingerse Sep 15 '17 at 15:28
0

Looking through the source code here, DeflateStream reads the input data in 8K blocks :-/, so if your input file is small, it will look like it's reading up to the end of the file,

However, DeflateStream instances have a private member _inflater, which have a private member _zlibStream, which have a property AvailIn, which returns the number of bytes available in the input buffer. IOW, this is the number of bytes too many that have been read, so by using reflection to get at these private parts, we can move the file pointer backwards by that many bytes, to return it to where it should've been left i.e. just past the end of the compressed data.

This code is F#, but it should be clear what's going on:

// zstream is the DeflateStream instance
let inflater = typeof<DeflateStream>.GetField( "_inflater", BindingFlags.NonPublic ||| BindingFlags.Instance ).GetValue( zstream )
let zlibStream = inflater.GetType().GetField( "_zlibStream", BindingFlags.NonPublic ||| BindingFlags.Instance ).GetValue( inflater )
let availInMethod = zlibStream.GetType().GetProperty( "AvailIn" ).GetMethod
let availIn: uint32 = unbox( availInMethod.Invoke( zlibStream, null ) )
// inp is the input file
inp.Seek( -(int64 availIn), SeekOrigin.Current ) |> ignore
taka
  • 69
  • 5