.NET 6 failing at Decompress large gzip text

Question

I have to decompress some gzip text in .NET 6 app, however, on a string that is 20,627 characters long, it only decompresses about 1/3 of it. The code I am using code works for this string in .NET 5 or .NETCore 3.1 As well as smaller compressed strings.

public static string Decompress(this string compressedText)
{
    var gZipBuffer = Convert.FromBase64String(compressedText);
    using var memoryStream = new MemoryStream();
    int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
    memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
    var buffer = new byte[dataLength];
    memoryStream.Position = 0;
    using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
    {
        gZipStream.Read(buffer, 0, buffer.Length);
    }
    return Encoding.UTF8.GetString(buffer);
}

The results look something like this:

Star of amazing text..... ...Text is fine till 33,619 after that is allNULLNULLNULLNULL

The rest of the file after the 33,618 characters is just nulls.

I have no idea why this is happening.

Edit: I updated this when I found the issue was not Blazor but in fact .NET 6. I took a project that was working in .NET Core 3.1 changed nothing other than compiling for .NET 6 and got the same error. The update reflects this.

Edit2: Just tested and it works in .NET 5 so it just .NET 6 that this error happens in.

@NetMage well that isn't what I was hoping for. Going to do it now — Xaphann, Jan 31 '22 at 22:26
1) as far as "byte size", 20,627 characters isn't really "large" at all. 2) The .zip format checks both the *BEGINNING* and *END* of the file before uncompressing, so it's unlikely the data itself is corrupt. — paulsm4, Jan 31 '22 at 22:27
@paulsm4 I agree that i isn't that large and I know the data isn't corrupted. I don't know the limit I just know that about 6,000 characters works and 20,000 does not — Xaphann, Jan 31 '22 at 22:30
Show an actual reproducible piece of code that has that problem, it's hard to tell with just the decoding function where you messed it up. — Blindy, Jan 31 '22 at 22:30
@Blindy what do you mean by a reproducible piece of code? Do you want me to put 20,000+ characters of compressed text into the StackOverflow editor? It works in .NET Core 3.1. I can take a project just change it from 3.1 to 6 and get the error. Switch it back to 3.1 and it works — Xaphann, Jan 31 '22 at 22:39
@Blindy To me that fact that the same code works in .Net 3.1 points to a bug in .Net 6, or some type of change. — NetMage, Jan 31 '22 at 22:43
@NetMage just tried it in .NET 5 and the code works. Updated — Xaphann, Jan 31 '22 at 22:47
Perhaps [this change](https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/6.0/partial-byte-reads-in-streams) is what you are running into - you don't seem to be using the return value from `Read()`. — NetMage, Jan 31 '22 at 22:48
@NetMage from a quick read that sounds exactly like my issue. I am going to have to screw around with this tomorrow... Thanks! — Xaphann, Jan 31 '22 at 22:51

Wiktor Zychla · Accepted Answer · 2022-02-01T10:43:35.793

Just confirmed that the article linked in the comments below the question contains a valid clue on the issue.

Corrected code would be:

string Decompress(string compressedText)
{
    var gZipBuffer = Convert.FromBase64String(compressedText);

    using var memoryStream = new MemoryStream();
    int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
    memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);

    var buffer = new byte[dataLength];
    memoryStream.Position = 0;

    using var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress);

    int totalRead = 0;
    while (totalRead < buffer.Length)
    {
        int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
        if (bytesRead == 0) break;
        totalRead += bytesRead;
    }

    return Encoding.UTF8.GetString(buffer);
}

This approach changes

gZipStream.Read(buffer, 0, buffer.Length);

to

    int totalRead = 0;
    while (totalRead < buffer.Length)
    {
        int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
        if (bytesRead == 0) break;
        totalRead += bytesRead;
    }

which takes the Read's return value into account correctly.

Without the change, the issue is easily repeatable on any string random enough to produce a gzip of length > ~10kb.

Here's the compressor, if anyone's interested in testing this on your own

string Compress(string plainText)
{
    var buffer = Encoding.UTF8.GetBytes(plainText);
    using var memoryStream = new MemoryStream();

    var lengthBytes = BitConverter.GetBytes((int)buffer.Length);
    memoryStream.Write(lengthBytes, 0, lengthBytes.Length);

    using var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress);
    
    gZipStream.Write(buffer, 0, buffer.Length);
    gZipStream.Flush();

    var gZipBuffer = memoryStream.ToArray();

    return Convert.ToBase64String(gZipBuffer);
}

.NET 6 failing at Decompress large gzip text

1 Answers1

Linked