16

I have to decompress some gzip text in .NET 6 app, however, on a string that is 20,627 characters long, it only decompresses about 1/3 of it. The code I am using code works for this string in .NET 5 or .NETCore 3.1 As well as smaller compressed strings.

public static string Decompress(this string compressedText)
{
    var gZipBuffer = Convert.FromBase64String(compressedText);
    using var memoryStream = new MemoryStream();
    int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
    memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
    var buffer = new byte[dataLength];
    memoryStream.Position = 0;
    using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
    {
        gZipStream.Read(buffer, 0, buffer.Length);
    }
    return Encoding.UTF8.GetString(buffer);
}

The results look something like this:

Star of amazing text..... ...Text is fine till 33,619 after that is allNULLNULLNULLNULL

The rest of the file after the 33,618 characters is just nulls.

I have no idea why this is happening.

Edit: I updated this when I found the issue was not Blazor but in fact .NET 6. I took a project that was working in .NET Core 3.1 changed nothing other than compiling for .NET 6 and got the same error. The update reflects this.

Edit2: Just tested and it works in .NET 5 so it just .NET 6 that this error happens in.

Xaphann
  • 3,195
  • 10
  • 42
  • 70
  • 3
    I would open a github ticket for .Net 6. – NetMage Jan 31 '22 at 22:19
  • 1
    Does it help if you add .Flush() after .Write(..)? – Wiktor Zychla Jan 31 '22 at 22:21
  • @NetMage well that isn't what I was hoping for. Going to do it now – Xaphann Jan 31 '22 at 22:26
  • @WiktorZychla that doesn't make a difference – Xaphann Jan 31 '22 at 22:26
  • 1
    1) as far as "byte size", 20,627 characters isn't really "large" at all. 2) The .zip format checks both the *BEGINNING* and *END* of the file before uncompressing, so it's unlikely the data itself is corrupt. – paulsm4 Jan 31 '22 at 22:27
  • @paulsm4 I agree that i isn't that large and I know the data isn't corrupted. I don't know the limit I just know that about 6,000 characters works and 20,000 does not – Xaphann Jan 31 '22 at 22:30
  • Show an actual reproducible piece of code that has that problem, it's hard to tell with just the decoding function where you messed it up. – Blindy Jan 31 '22 at 22:30
  • @Blindy what do you mean by a reproducible piece of code? Do you want me to put 20,000+ characters of compressed text into the StackOverflow editor? It works in .NET Core 3.1. I can take a project just change it from 3.1 to 6 and get the error. Switch it back to 3.1 and it works – Xaphann Jan 31 '22 at 22:39
  • 1
    @Blindy To me that fact that the same code works in .Net 3.1 points to a bug in .Net 6, or some type of change. – NetMage Jan 31 '22 at 22:43
  • 1
    Can you try .Net Core 5? – NetMage Jan 31 '22 at 22:44
  • @NetMage just tried it in .NET 5 and the code works. Updated – Xaphann Jan 31 '22 at 22:47
  • 6
    Perhaps [this change](https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/6.0/partial-byte-reads-in-streams) is what you are running into - you don't seem to be using the return value from `Read()`. – NetMage Jan 31 '22 at 22:48
  • @NetMage from a quick read that sounds exactly like my issue. I am going to have to screw around with this tomorrow... Thanks! – Xaphann Jan 31 '22 at 22:51

1 Answers1

22

Just confirmed that the article linked in the comments below the question contains a valid clue on the issue.

Corrected code would be:

string Decompress(string compressedText)
{
    var gZipBuffer = Convert.FromBase64String(compressedText);

    using var memoryStream = new MemoryStream();
    int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
    memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);

    var buffer = new byte[dataLength];
    memoryStream.Position = 0;

    using var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress);

    int totalRead = 0;
    while (totalRead < buffer.Length)
    {
        int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
        if (bytesRead == 0) break;
        totalRead += bytesRead;
    }

    return Encoding.UTF8.GetString(buffer);
}

This approach changes

gZipStream.Read(buffer, 0, buffer.Length);

to

    int totalRead = 0;
    while (totalRead < buffer.Length)
    {
        int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
        if (bytesRead == 0) break;
        totalRead += bytesRead;
    }

which takes the Read's return value into account correctly.

Without the change, the issue is easily repeatable on any string random enough to produce a gzip of length > ~10kb.

Here's the compressor, if anyone's interested in testing this on your own

string Compress(string plainText)
{
    var buffer = Encoding.UTF8.GetBytes(plainText);
    using var memoryStream = new MemoryStream();

    var lengthBytes = BitConverter.GetBytes((int)buffer.Length);
    memoryStream.Write(lengthBytes, 0, lengthBytes.Length);

    using var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress);
    
    gZipStream.Write(buffer, 0, buffer.Length);
    gZipStream.Flush();

    var gZipBuffer = memoryStream.ToArray();

    return Convert.ToBase64String(gZipBuffer);
}
Wiktor Zychla
  • 47,367
  • 6
  • 74
  • 106