2

I have some large data files which I can retrieve in chunks of let's say 32kb using an API specially designed for this. One usage of the API can be the following:

LargeFileAPI lfa = new LargeFileAPI("file1.bin");
bool moredata = true;
List<byte[]> theWholeFile = new List<byte[]>();
while ( moredata  ) 
{
  byte[] arrayRead = new byte[32768];
  moredata = lfa.Read(arrayRead);
  theWholeFile.Add(arrayRead);
}

The problem with the above is that reading from it takes up as much memory as the size of the large file (let's say 100Mb). And since I want to pass this as a return result to a WCF service, I would prefer to use a Stream as the output of the service.

How can I create a Stream object from this and pass it as a return parameter to a WCF service without occupying the full file size in memory?

I was thinking of creating a class LargeFileStream inheriting from

System.IO.Stream

and override the Read method. But I cannot seem to figure out how to work through the fact that Stream.Read takes an offset parameter and a number of bytes to read, because the API I mentioned requires reading a fixed number of bytes for each read. Moreover, what about all the other methods I have to override, such as Flush(), Position and whatever else there is. What should they imeplement? I am asking because I have no idea what other functions than Stream.Read(), WCF would call when I am reading the stream from the client (the caller of the WCF service).

Moreover, I need it to be serializable so that it can be an output parameter to a WCF service.

Thanks Jihad

Jihad Haddad
  • 592
  • 1
  • 6
  • 19

3 Answers3

2

You can write your stream to do what you want, using one buffer of your api size (i.e. 32kb) and recycling it while reading. Sample code is below (not that it's not production ready and needs testing, but something to make you a start):

public class LargeFileApiStream : Stream {
    private readonly LargeFileApi _api;
    private bool _hasMore;
    private bool _done;
    private byte[] _buffer;
    const int ApiBufferSize = 32768;
    public LargeFileApiStream(LargeFileApi api) {
        _api = api;    
    }

    public override void Flush() {
        // you can ignore that, this stream is not writable
    }

    public override long Seek(long offset, SeekOrigin origin) {
        throw new NotSupportedException(); // not seekable, only read from beginning to end
    }

    public override void SetLength(long value) {
        throw new NotSupportedException(); // not writable
    }        

    public override void Write(byte[] buffer, int offset, int count) {
        throw new NotSupportedException(); // not writable
    }

    public override int Read(byte[] buffer, int offset, int count) {
        // if we reached end of stream before - done
        if (_done)
            return 0;

        if (_buffer == null) {
            // we are just starting, read first block
            _buffer = new byte[ApiBufferSize];
            _hasMore = _api.Read(_buffer);
        }

        var nextIndex = _position % ApiBufferSize;
        int bytesRead = 0;
        for (int i = 0; i < count; i++) {
            if (_buffer.Length <= nextIndex) {
                // ran out of current chunk - fetch next if possible                    
                if (_hasMore) {
                    _hasMore = _api.Read(_buffer);
                }
                else {
                    // we are done, nothing more to read
                    _done = true;
                    break;
                }
                // reset next index back to 0, we are now reading next chunk
                nextIndex = 0;
                buffer[offset + i] = _buffer[nextIndex];
                nextIndex++;
                bytesRead++;
            }
            else {
                // write byte to output buffer
                buffer[offset + i] = _buffer[nextIndex];
                nextIndex++;
                bytesRead++;
            }                                                                
        }

        _position += bytesRead;
        return bytesRead;
    }

    public override bool CanRead {
        get { return true; }
    }
    public override bool CanSeek {
        get { return false; }
    }
    public override bool CanWrite {
        get { return false; }
    }
    public override long Length {
        get { throw new NotSupportedException(); }
    }

    private long _position;
    public override long Position
    {
        get { return _position; }
        set { throw new NotSupportedException(); } // not seekable
    }
}
Evk
  • 98,527
  • 8
  • 141
  • 191
  • This is exactly what I was thinking. I just wanted to make sure I was thinking the right way. Thanks – Jihad Haddad Apr 13 '16 at 07:55
  • How do I ensure that the connection to LargeFileAPI is closed correctly. I have tried some tests file FileStream and have realized that neither Dispose nor Close are called when the WCF service has stopped reading. – Jihad Haddad Apr 13 '16 at 08:47
  • WCF should dispose returned streams by default unless you didn't change that. If you use custom stream like above - override Dispose method of base Stream class and close your api conneciton there. Also see this: http://stackoverflow.com/q/6483320/5311735 – Evk Apr 13 '16 at 08:52
  • Hi and thanks again. I have overridden Stream's Dispose, but it does not get called. Any idea of what I can do to force a call to it. – Jihad Haddad Apr 13 '16 at 12:39
  • Did you check that AutoDisposeParameters is true (as described at link above)? Do you just return plain Stream from WCF method call (not wrapped inside some other response object)? – Evk Apr 13 '16 at 12:41
  • Hi. I read it again and found out that because my Stream is a property in the returned object, I need that container class to implement the IDisposable interface. And now it works. Thanks a lot – Jihad Haddad Apr 13 '16 at 18:50
1

Just store your data in temporary file like this:

// create temporary stream
var stream = new FileStream(Path.GetTempFileName(), FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, 4096, FileOptions.DeleteOnClose);

try
{
    // write all data to temporary stream
    while (moredata) 
    {
        byte[] arrayRead = new byte[32768];
        moredata = lfa.Read(arrayRead);
        stream.Write(arrayRead, 0, arrayRead.Length);
    }

    stream.Flush();

    stream.Position = 0; // Reset position so stream will be read from beginning
}
catch
{
    stream.Close(); // close stream to delete temporary file if error occured
}

Temporary file stream holds data received from LargeFileApi. You won't run out of memory since data actually stored in file.

Temporary file will be deleted after stream is closed because of FileOptions.DeleteOnClose option passed to constructor. So, you can just close the stream if something goes wrong or when you done with reading.

Maxim Kosov
  • 1,930
  • 10
  • 19
  • Thanks. This is definitely a way to go. I have the following concerns though, which you might help me out with: 1- If something goes wrong (e.q. the web service crashes), then the file is not deleted. 2- It will not be so efficient to write the whole file to disk, and then read it again. 3- Moreover, am I sure that the web service has permission to write to a temporary file? What is default/best practice here for the IIS setup?The production environment we have is known to be quite restrictive. – Jihad Haddad Apr 12 '16 at 20:33
  • About efficiency: @Evk have a nice idea of how to read data as stream without temporary buffer. But it much more difficult to implement the best way and there is a question about how LargeFileApi is working. If LargeFileApi holds a connection until file read totally then connection can expire when file wasn't read to completion. I had such issues (with expired connection) in production long time ago and started to use solution with temporary file. So, I would recommend to test both solutions and decide what works best in Your case – Maxim Kosov Apr 13 '16 at 06:48
  • About your concerns: 1. You can wrap while(modedata) {...} into try...catch block and close stream in case of Exception. 2. It depends on what you want to do with stream. 3. I'm 90% sure what web service would have access to _temporary directory_, but it is easy to check to be sure on 100% – Maxim Kosov Apr 13 '16 at 07:00
0

You can perform the following:

  1. Create a WCF service with netTcpBinding. The service can return a object with MessageContract attribute applied to it

    [MessageContract] public class LargeStream {

    [MessageHeader]
    public int Section { get; set; }
    
    [MessageBodyMember]
    public Stream Data { get; set; }
    

    }

If you would like to add additional metadata decorate them with the MessageHeader attribute.

On the client side, the web application can consume the service and make two requests.

  1. To get the number of sections for the file
  2. For each of the section request the stream
  3. Combine all the streams after the download is done into a single file.