0

Assume we have given an API function f(Stream s) to put binary data contained in a stream into a database. I want to put a file into the database using f but I want to compress the data in advance. Hence I thought I could do the following:

var fileStream= File.OpenRead(path);
using(var dstream = new DeflateStream(fileStream, CompressionLevel.Optimal))
   f(dstream);

But it seems DeflateStream only writes into the stream fileStream but does not read from it when compressing. In all examples I found, the CopyTo method of the stream was used to compress or decompress. But this would mean that I have to keep a copy of the compressed data in memory before passing it to f for instance like this:

var memoryStream = new MemoryStream();
using(var fileStream= File.OpenRead(path)) 
  using(var dstream = new DeflateStream(memoryStream, CompressionLevel.Optimal)) {
    fileStream.CopyTo(dstream);
    memoryStream.Seek(0, SeekOrigin.Begin);
    f(memoryStream);
  }    

Is there any way to avoid using the MemoryStream?

Update For the sake of the persistency of some commentators I add a complete example:

using System;
using System.IO;
using System.IO.Compression;

public class ThisWouldBeTheDatabaseClient {
  public void f(Stream s) {
    // some implementation I don't have access to
    // The only thing I know is that it reads data from the stream in some way.
    var buffer = new byte[10];
    s.Read(buffer,0,10);
  }
}

public class Program {
  public static void Main() {
    var dummyDatabaseClient = new ThisWouldBeTheDatabaseClient();
    var dataBuffer = new byte[1000];
    var fileStream= new MemoryStream( dataBuffer ); // would be "File.OpenRead(path)" in real case
    using(var dstream = new DeflateStream(fileStream, CompressionLevel.Optimal))
        dummyDatabaseClient.f(dstream);
  }
}

The read operation in the dummy implementation of f throws an exception: InvalidOperationException: Reading from the compression stream is not supported. Concluding the discussion in the comments, I assume that the desired behaviour is not possible with DeflateStream but there are alternatives in third party libraries.

MarkusParker
  • 1,264
  • 2
  • 17
  • 35
  • The code you've provided wouldn't compile - there are missing brackets and braces in various places. Please provide a [mcve]. – Jon Skeet Dec 14 '17 at 08:51
  • 1
    (I'd also very strongly advise you do use braces even for single-statement if/using/etc statements.) – Jon Skeet Dec 14 '17 at 08:53
  • Note that these edits still aren't creating a [mcve]. We can't copy, paste, compile, run and see the problem. – Jon Skeet Dec 14 '17 at 08:55
  • @JonSkeet I think you are too harsh to this question. It looks quite clear, and it cannot contain minimal example because it's not about code that works incorrectly, but instead about how to write code in a way to achieve specified goal (avoid buffering into additional `MemoryStream`). – Evk Dec 14 '17 at 09:00
  • @Evk: It could easily contain a minimal example that doesn't work. (We don't even know the *way* in which the first code doesn't work. Does it throw an exception? Give the wrong answer?) Given a minimal example that doesn't work, I'd be happy to try to modify it to work without creating the extra copy (probably using SharpCompress). But I'm not going to go to the work of creating a test rig for that if the OP can't be bothered to. Creating a genuine [mcve] would IMO a) make it easier to help the OP; b) clarify the question; c) make the question more useful to others. – Jon Skeet Dec 14 '17 at 09:02
  • 1
    The DeflateStream represents the uncompressed stream data and the wrapped stream the compressed stream data. - Yes, you will always need another stream for compression (here the MemoryStream). If you worry about memory consumption use a FileStream with a temporary file – Sir Rufo Dec 14 '17 at 09:05
  • @SirRufo: Or use a different library, of course. I believe SharpCompress's DeflateStream would be okay with this, but until we're in a situation where I can easily test a proposed change, we won't know... – Jon Skeet Dec 14 '17 at 09:40
  • Finally, we have a complete example. Not sure why you couldn't have provided that 3 hours ago, but at least it makes it easy to show that my answer works... – Jon Skeet Dec 14 '17 at 12:24

2 Answers2

2

The DeflateStream is just a wrapper and needs a stream for the compressed data. So you have to use two streams.

Is there any way to avoid using the MemoryStream?

Yes.

You need a stream to store temporary data without consuming (too much) memory. Instead using MemoryStream you can use a temporary file for that.

For the lazy people (like me in first place) let's create a class that will behave mostly like a MemoryStream

public class TempFileStream : FileStream
{
    public TempFileStream() : base(
        path: Path.GetTempFileName(),
        mode: FileMode.Open,
        access: FileAccess.ReadWrite,
        share: FileShare.None,
        bufferSize: 4096,
        options: FileOptions.DeleteOnClose | FileOptions.Asynchronous | FileOptions.Encrypted | FileOptions.RandomAccess)
    {
    }
}

The important part here is FileOptions.DeleteOnClose which will remove the temporary file when you dispose the stream.

And then use it

using (var compressedStream = new TempFileStream())
{
    using (var deflateStream = new DeflateStream(
        stream: compressedStream,
        compressionLevel: CompressionLevel.Optimal,
        leaveOpen: true))
    using (var fileStream = File.OpenRead(path))
    {
        fileStream.CopyTo(deflateStream);
    }

    f(compressedStream);
}
Sir Rufo
  • 18,395
  • 2
  • 39
  • 73
  • And why `FileOption.Encrypted`? – Evk Dec 14 '17 at 10:04
  • Just for being paranoic :o) – Sir Rufo Dec 14 '17 at 10:07
  • I think there's a better approach to this, which is to use an alternative to the built-in `DeflateStream`. No need for a temporary file at all. I strongly suspect that the `DeflateStream` in `SharpCompress` will work. If the OP ever provides a [mcve], we'll see... – Jon Skeet Dec 14 '17 at 10:28
  • @JonSkeet After a short look at SharpCompress the DeflateStream needs a stream as well - it does not solve the main problem: process memory consumption when using MemoryStream – Sir Rufo Dec 14 '17 at 10:42
  • @SirRufo: Not sure what you mean by "needs a stream" - yes, it wraps another stream, but so long as it does that in the way that's required, that would be fine. The built-in `DeflateStream` doesn't do that. Again, if the OP posted a complete example, it would be a lot easier to help. Fundamentally there's no reason a backing file (or complete copy of data in memory) *should* be needed here. – Jon Skeet Dec 14 '17 at 10:43
  • @JonSkeet OP said: *But this would mean that I have to keep a copy of the compressed data in memory* - that is the problem the OP tries to solve and that will not be solved by using SharpCompress.DeflateStream. The compressed data has to be stored somewhere, and if not in process memory there is only one option left (I am aware of): use a file – Sir Rufo Dec 14 '17 at 10:46
  • @SirRufo: No, again there should be no need to store the *complete* stream in memory. I *suspect* that the OP's original code would work fine with SharpCompress. But the OP appears not to really care about improving their question. – Jon Skeet Dec 14 '17 at 10:51
  • @JonSkeet This [sample](https://dotnetfiddle.net/MUTpPb) shows you, when the compressed stream contains the **whole** data. After disposing DeflateStream. BTW SharpCompress.DeflateStream did not do anything, which can be seen too – Sir Rufo Dec 14 '17 at 11:36
  • Your sample code is like the *second* example given by the OP though. I'm saying that I believe it will make the *first* example work. And the reason you're getting a length of 0 for SharpCompress is because you didn't "rewind" `dataStream` in Main`. – Jon Skeet Dec 14 '17 at 11:42
  • @JonSkeet gotcha => [fixed sample](https://dotnetfiddle.net/tpULAW) - but you have to dispose the DeflateStream before you can consume the compressed stream => compressed stream has to be stored somewhere – Sir Rufo Dec 14 '17 at 11:45
  • @SirRufo: I don't believe so. See https://dotnetfiddle.net/lPQWDl - I'll change ConsumeStream to "reinflate" the data to prove that it's correct. – Jon Skeet Dec 14 '17 at 11:47
  • @JonSkeet Are you sure that **without** compression you can put 50.000 Bytes into 33.792 Bytes? I would expect a length >= 50.000 Bytes as the result from .net DeflateStream – Sir Rufo Dec 14 '17 at 11:50
  • @SirRufo: Still investigating. But the point is *to* compress - that's the idea, that we create a stream which contains the compressed data. Or are you expecting it to be uncompressible due to the pattern of data you've created? – Jon Skeet Dec 14 '17 at 11:54
  • Ah - perhaps you mean the "CompressionLevel.None" part? Hadn't spotted that :) Still working... – Jon Skeet Dec 14 '17 at 11:55
  • Okay, I believe it's working now. See https://dotnetfiddle.net/lPQWDl. With DecompressionLevel.None, the "compressed" stream has a length of 50010. With DecompressionLevel.Default, it has a length of 519. In both cases, that stream contains the correct data: 50000 bytes in the original pattern. There's still an oddity if you inflate the compressed stream directly, but I haven't found out whether that's a bug in the .NET DeflateStream or not. (It feels like it probably is, but it's hard to tell for sure.) – Jon Skeet Dec 14 '17 at 12:02
  • Ah, so we have to copy the stream to get the compressed data, right? – Sir Rufo Dec 14 '17 at 12:04
  • Well you have to *read* from the stream to get the compressed data, but that's entirely reasonable. The ConsumeStream method could write it straight to a database in a streaming fashion, for example... in which case there will never be a complete copy of the data in memory, nor a "temporary" copy on disk. That's what the OP is after. I'm still looking at why decompressing immediately fails. I suspect that .NET's DeflateStream has a bug when working with a non-seekable stream. – Jon Skeet Dec 14 '17 at 12:18
  • I know, that it is reasonable, but lets go back where we started: the whole compressed data will consume process memory when we are using MemoryStream, right? => The OP does not want to waste process memory with the compressed data. Resolution: use a file for it – Sir Rufo Dec 14 '17 at 12:21
  • That issue goes away when I run it locally, so I suspect that either maybe dotnetfiddle is running Mono and there's a bug in that version, or it's been fixed in a more recent version of .NET. – Jon Skeet Dec 14 '17 at 12:21
  • No, no, no. The point is we don't *need* a `MemoryStream`. We're only using that for diagnostics in the exmaple. It doesn't have to be there at all. I'll post an answer, although I still resent the fact that *you* had to come up with the complete example rather than the OP bothering to do so. – Jon Skeet Dec 14 '17 at 12:22
  • 1
    Thanks for Sir Rufo's answer, but the solution Jon Skeet suggests using SharpCompress is closer to what I want. I really appreciate spending your time on my question. – MarkusParker Dec 14 '17 at 12:32
2

You can use SharpCompress for this. Its DeflateStream allows you to read the compressed data on the fly, which is exactly what you want.

Here's a complete example based on Sir Rufo's:

using System;
using System.IO;
using SharpCompress.Compressors;
using SharpCompress.Compressors.Deflate;
using System.Linq;

public class Program
{
    public static void Main()
    {
        var dataBuffer = Enumerable.Range(1, 50000).Select(e => (byte)(e % 256)).ToArray();

        using (var dataStream = new MemoryStream(dataBuffer))
        {
            // Note: this refers to SharpCompress.Compressors.Deflate.DeflateStream                
            using (var deflateStream = new DeflateStream(dataStream, CompressionMode.Compress))
            {
                ConsumeStream(deflateStream);
            }
        }
    }

    public static void ConsumeStream(Stream stream)
    {
        // Let's just prove we can reinflate to the original data...
        byte[] data;
        using (var decompressed = new MemoryStream())
        {
            using (var decompressor = new DeflateStream(stream, CompressionMode.Decompress))
            {
                decompressor.CopyTo(decompressed);
            }
            data = decompressed.ToArray();
        }
        Console.WriteLine("Reinflated size: " + data.Length);
        int errors = 0;
        for (int i = 0; i < data.Length; i++)
        {
            if (data[i] != (i + 1) % 256)
            {
                errors++;
            }
        }
        Console.WriteLine("Total errors: " + errors);
    }
}

Or using your sample code:

using System;
using System.IO;
using SharpCompress.Compressors;
using SharpCompress.Compressors.Deflate;

public class ThisWouldBeTheDatabaseClient {
  public void f(Stream s) {
    // some implementation I don't have access to
    // The only thing I know is that it reads data from the stream in some way.
    var buffer = new byte[10];
    s.Read(buffer,0,10);
  }
}

public class Program {
  public static void Main() {
    var dummyDatabaseClient = new ThisWouldBeTheDatabaseClient();
    var dataBuffer = new byte[1000];
    var fileStream= new MemoryStream( dataBuffer ); // would be "File.OpenRead(path)" in real case
    using(var dstream = new DeflateStream(
        fileStream, CompressionMode.Compress, CompressionLevel.BestCompression))
        dummyDatabaseClient.f(dstream);
  }
}

This now doesn't throw an exception, and will serve the compressed data.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194