0

I'm running into an OutOfMemoryException. My flow was as follows:

  1. I read the data from the database (big collection, millions of rows).
  2. I loop through the retrieved data and append it to a StringBuilder.
  3. I call ToString on the StringBuilder and upload the string to a new file using SFTP (Renci.SSHnet).

This flow didn't work because there were too many files to keep in memory. I decided to make some changes:

  • The data is now fetched using a data reader and yield return (instead of returning the whole dataset)
  • I can't call ToString on the StringBuilder because the string will be too big to fit in the memory. I changed the StringBuilder to StreamWriter so I can write directly to a stream which can then be used by the SFTP upload process.

This is where things started to get tricky. I want to append new data to the StreamWriter in my data processing code while the SFTP upload code reads from the same stream, GZIPs it and uploads the data. Note that it should only create one file on the remote server. In other words: I want to process data while uploading the processed data so I can keep a very low memory footprint. I can't get this working. This is what I came up with (simplified example), but it never reads the data:

static void Main()
{
    using (var stream = new MemoryStream())
    {
        WriteToStream(stream, GetData());
        ReadFromStream(stream);
    }
}

static IEnumerable<string> GetData()
{
    for (int i = 0; i < 10; i++)
    {
        var date = DateTime.Now;
        Console.WriteLine("{0:O} - Fetched some data: {1}", date, i);
        yield return string.Format("{0}", i);
    }
}

static void WriteToStream(Stream stream, IEnumerable<string> data)
{
    using (var writer = new StreamWriter(stream))
    {
        foreach (string str in data)
        {
            Console.WriteLine("{0:O} - Wrote some data: {1}", DateTime.Now, str);
            writer.WriteLine("{0}", str);
        }
    }
}

static void ReadFromStream(Stream stream)
{
    using (var reader = new StreamReader(stream))
    {
        while (reader.Peek() >= 0)
        {
            var res = reader.ReadLine();
            Console.WriteLine("{0} - Read some data: {1}", DateTime.Now, res);
        }
    }
}

Can anyone point me in the right direction?

Leon Cullens
  • 12,276
  • 10
  • 51
  • 85
  • 1
    See http://stackoverflow.com/questions/24253975/can-i-get-a-gzipstream-for-a-file-without-writing-to-intermediate-temporary-stor, which is very similar to what you're trying to do. – Jim Mischel Dec 09 '14 at 17:09
  • 1
    Your example doesn't work because the `StreamWriter` closes the underlying stream (the `MemoryStream`) when the `StreamWriter` is closed. You can use [this overloaded constructor](http://msdn.microsoft.com/en-us/library/gg712853(v=vs.110).aspx) to prevent that. You'll also want to set the `MemoryStream` position to 0 before calling `ReadFromStream`. But that still won't give you what you want, because the FTP upload component probably won't cooperate. The link I provided above will let you do what you want. You will, however, need to have either the producer or consumer on a separate thread. – Jim Mischel Dec 09 '14 at 20:54

1 Answers1

1

Instead of creating a new stream yourself and passing that stream to the ftp component, use the ftp components' method for initiating a new stream (SftpClient.Open(...)) and write to that. Similar to your WriteToStream method, just probably have to replace the writeline with write.

The rest of your issue is still interesting though :)

Robert Sirre
  • 672
  • 1
  • 8
  • 28