0

I'm getting an OutOfMemoryException when changing the encoding of a stream. Up until now, the streams have been relatively small (less than 50 MB), but now I'm encountering a different scenario where they are around 1.78 GB streams. So they're pretty massive.

As for the infrastructure, this is in an Azure Cloud Service and has 7GB of memory, and is MAXING it out. (It is running as an x64 process). And I know that the underlying issue is that I have too many copies of the reportStream in memory at once (upwards of 5 or 6).

Exception:

System.OutOfMemoryException: Array dimensions exceeded supported range.
at System.Text.Encoding.GetChars(System.Byte[] bytes, System.Int32 index, System.Int32 count) at offset 9
at System.Text.Encoding.Convert(System.Text.Encoding srcEncoding, System.Text.Encoding dstEncoding, System.Byte[] bytes, System.Int32 index, System.Int32 count) at offset 61
at System.Text.Encoding.Convert(System.Text.Encoding srcEncoding, System.Text.Encoding dstEncoding, System.Byte[] bytes) at offset 21
at MyNamespace.MyClass.ChangeEncoding(System.IO.Stream reportStream) at offset 72 in ... 

Code:

    private static Stream ChangeEncoding(Stream reportStream)
    {
        var utf8 = Encoding.UTF8;

        // The reports aren't actually in ASCII encoding.
        // There in a superset of ASCII that's specific to Windows called "Windows-1252".
        // Windows-1252 contains some special characters, whereas ASCII doesn't have any special
        // characters at all.
        // https://en.wikipedia.org/wiki/Windows-1252
        var win = Encoding.GetEncoding("Windows-1252");

        var length = (int) reportStream.Length;
        var buffer = new byte[length];
        int count;
        var sum = 0;

        // Read until Read method returns 0 (end of the stream has been reached)    
        while ((count = reportStream.Read(buffer, sum, length - sum)) > 0)
        {
            sum += count;
        }

        var convertedBytes = Encoding.Convert(win, utf8, buffer);

        var outputStream = new MemoryStream();

        outputStream.Write(convertedBytes, 0, convertedBytes.Length);

        // Reset the position otherwise it won't zip and subsequently upload the report.
        outputStream.Position = 0;

        return outputStream;
    }

To get to the question, how can I change this to convert the encoding in chunks rather than the entirety of the stream at once?

Cameron
  • 2,574
  • 22
  • 37
  • @user1666620 I'm not using a `StreamReader`. – Cameron Feb 12 '16 at 17:31
  • Answer is in MSDN - https://msdn.microsoft.com/en-us/library/c20ss51h(v=vs.100).aspx - "If the data to be converted is available only in sequential blocks (such as data read from a stream) or if the amount of data is so large that it needs to be divided into smaller blocks, the application should use the Decoder or the Encoder provided by the GetDecoder method or the GetEncoder method, respectively, of a derived class." – Alexei Levenkov Feb 12 '16 at 17:36
  • In case somebody runs into this, using a StreamReader is a good idea: https://stackoverflow.com/questions/42551162/how-do-i-convert-encoding-of-a-large-file-1-gb-in-size-to-windows-1252-with – Matti Virkkunen Jun 13 '17 at 17:17

0 Answers0