1

I have the following method which gets a memory stream and compresses it using ZipArchive:

public async Task<MemoryStream> CompressStreamAsync(MemoryStream content, string fileName)
    {
        var memoryStream = new MemoryStream();
        memoryStream.Seek(0, SeekOrigin.Begin);

        using var zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true);
        var zipArchiveEntry = zipArchive.CreateEntry(fileName);

        using var streamWriter = new StreamWriter(zipArchiveEntry.Open());
        var charArrayContent = Encoding.UTF8.GetString(content.GetBuffer())
            .ToCharArray();

        await streamWriter.WriteAsync(charArrayContent, 0, charArrayContent.Length);

        return memoryStream;
    }

When extracting the archive content, the archived file's size is bigger then it originally was and I noticed white space at the end of the file, which wasn't in the original file. Any ideas why this might happen ?

Best regards.

  • 1
    Are you compressing files that are already a compressed format, like .jpeg, etc? – Crowcoder May 24 '22 at 12:41
  • 1) The `ZipArchive` and `ZipArchiveEntry` are basically a small potted filesystem and thus have some overhead associated with their internal structures. That likely accounts for the padding at the end. 2) If the original data was highly random, compression can result in a larger file, see [Why is a 7zipped file larger than the raw file?](https://superuser.com/q/464315/1031694). – dbc May 24 '22 at 12:54
  • 3
    Why do you decode & encode the `content` stream? Why not just use [`Stream.CopyToAsync()`](https://learn.microsoft.com/en-us/dotnet/api/system.io.stream.copytoasync)? Or even just [`CopyTo()`](https://learn.microsoft.com/en-us/dotnet/api/system.io.stream.copyto) since everything is in memory? – dbc May 24 '22 at 12:56
  • 4
    _["**MemoryStream.GetBuffer** Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4"](https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream.getbuffer?view=net-6.0)_. So that's one reason why it appears larger. –  May 24 '22 at 13:02
  • 1
    @MickyD - Oh you're right! They need to use `Encoding.UTF8.GetString(content.GetBuffer(), 0, checked((int)content.Length))`! Or just `CopyToAsync()`. – dbc May 24 '22 at 13:04
  • 1
    @dbc ya, or if you don't mind duplication use `.ToArray()` for the simplicity. –  May 24 '22 at 13:12
  • Oh, a possible duplicate: [Unzipped data being padded with '\0' when using DotNetZip and MemoryStream](https://stackoverflow.com/q/13956170/3744182). – dbc May 24 '22 at 13:20
  • @AdrianChiritescu - Did using `content.Length` or `content.ToArray()` resolve the problem? – dbc May 24 '22 at 15:24
  • 1
    @dbc Apologizes for the late answer, I appreciate your concern and your help. Yes, content.ToArray() seems to solve the problem, thank you ! Thank you all for the good information that you've provided ! – Adrian Chiritescu May 24 '22 at 17:41

0 Answers0