3

Im reading in a file(this file consists of one long string which is 2gb in length).

This is my function which read all contents of the file into memory and then splits the string and places: *reader = StreamReader

public List<char[]> GetAllContentAsList()
        {
            int bytesToRead = 1000000;
            char[] buffer = new char[bytesToRead];
            List<char[]> results = new List<char[]>();

            while (_reader.Read(buffer, 0, bytesToRead) != 0)
            {
                char[] temp = new char[bytesToRead];
                Array.Copy(buffer,temp,bytesToRead);
                results.Add(temp);
            }

            return results;
        }

When all data in placed into the List it takes up 4gb in RAM. How is this possible when the file is only 2gb in size?

*Edit

This is what i ended up doing. Im not converting the array of bytes to a string, im just passing the bytes on an manipulating them. This was the fiel is only 2Gb in mem instead of 4gb

 public List<byte[]> GetAllContentAsList()
            {
                int bytesToRead = 1000000;
                var buffer = new byte[bytesToRead];
                List<byte[]> results = new List<byte[]>();

                while (_reader.Read(buffer, 0, bytesToRead) != 0)
                {
                    //string temp = Encoding.UTF8.GetString(buffer);
                    byte[] b = new byte[bytesToRead];
                    Array.Copy(buffer,b,bytesToRead);
                    results.Add(b);
                }

                return results;
            }
Ivan Bacher
  • 5,855
  • 9
  • 36
  • 56
  • 1
    How do you come to the conclusion the List takes up 4GB of memory. A single object is limited to 2GB. You do understand that the line `Array.Copy(buffer,temp,bytesToRead);` continues to eat up memory until the Garbage Collector decides to clean up after you right? – Security Hound May 08 '13 at 13:37
  • You can you use 4gb in c#? – Venson May 08 '13 at 13:38
  • @Venson - on a 64bit OS and process, sure, why not? – Oded May 08 '13 at 13:38
  • like Oded says, `char` can be bigger than the encoded `byte`(s). Why the arbitrary `List` creation and excessive array cloning in your code? `Files.ReadAllText("yourfile").ToCharArray()` seems equivalent. – Jodrell May 08 '13 at 13:40
  • @Oded im currently working on and program that's use a lot of ram ( multiple converting PDF's into Bitmaps and Print these ) and i have many troubles with the 1,5 gig limit ( even with the LARGEADDRESSAWARE and/or x64 compiling ) you may got the Reference to what you need, but i think pushing real 4 gb data into your ram is not that easy and smart – Venson May 08 '13 at 13:42
  • 4
    That is a pretty horrible way to read data, btw; it would be much better to use a streaming API (or a reader-based API) – Marc Gravell May 08 '13 at 13:44
  • @Venson - What 1.5GB limit exactly? .NET 4.0 allows for the use of very large objects with a flag. Its not even clear how the author determines the memory usage of the collection. – Security Hound May 08 '13 at 13:53
  • @Ramhound I got a lot of images as Byte arrays and when i try to store more that around 1 gb to 2 gb of data in my program the Application crashes with the OutOfMemory exception. After a while of searching i found an MSDN post that describes that it is not possible to make something like this successfully but it seems like that the LARGEADDRESSAWARE is such kind of a Workarount but it does not help me ... so this [post](http://www.codeproject.com/Articles/483475/Memory-Limits-in-a-NET-Process) Describes that its not even possible to allocate more than 1.3 gb – Venson May 08 '13 at 14:10

1 Answers1

16

Educated guess here:

The file is UTF-8 or ASCII encoded and only (mostly) contains singly byte wide characters (or possibly some other codepage that is mostly single byte wide).

Now, the .NET characters are UTF-16 which are all 2 (or more) bytes in length.

So, in memory the characters will be double the size.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • +1 Easily testable by changing the encoding of the file as it is saved. – slugster May 08 '13 at 13:36
  • That's probably it. [Char](http://msdn.microsoft.com/library/7sx7t66b.aspx) is 16-bit (2-byte). – Corak May 08 '13 at 13:37
  • 1
    @slugster - Sure, but with a 2GB file, I'll leave that to you to test ;) – Oded May 08 '13 at 13:37
  • Also, if i recall, `Array.Copy` may have memory implications on its own merit (though that's more during the working process, not the end result). – Brad Christie May 08 '13 at 13:37
  • Also, that list is going to be resizing itself throughout the operation and leaving behind objects that the GC may not have collected by the time you look at memory usage. – ta.speot.is May 08 '13 at 13:42
  • Thank you your answer solved my problem. I ended up just passing the bytes around, not converting to chars – Ivan Bacher May 09 '13 at 08:29