6

I'm sure we're all familiar and probably use the plethora of code provided in books, online, etc. in reading a file using C#. Something as simple as...

StringBuilder fiContents = new StringBuilder();
using (StreamReader fi = new StreamReader(@"C:\a_file.txt"))
{
    while (!fi.EndOfStream)
    {
        fiContents.Append(fi.ReadLine); 
    }
}

Or maybe something as short as...

using (StreamReader fi = new StreamReader(@"C:\a_file.txt"))
    fiContents.Append(fi.ReadToEnd());

Now let's go Super Saiyan for a moment and do really fancy stuff like have a BackgroundWorker which will allow us show a loading image (this what I'll use), provide a process countdown timer or ProgressBar.

public void ReadFile(string filename)
{
    BackgroundWorker procFile = new BackgroundWorker();
    // Progress 1: If we want to show the progress we need to enable the following property
    // procFile.WorkerReportsProgress = true;

    profile.DoWork += new DoWorkEventHandler((object obj, DoWorkEventArgs ev) =>
    {
        StringBuilder fiContents = new StringBuilder();

        using (StreamReader fi = new StreamReader(filename))
        {
            while (!fi.EndOfStream)
            {
                // Progress 2: Report the progress, this will be dealt with by the respective handler (below).
                // procFile.ReportProgress((int)(fi.BaseStream.Length / fi.BaseStream.Position) / 100);

                fiContents.Append(fi.ReadLine);
            }
        }

        ev.Result = fiContents;
    }

    /* Progress 3: The handler below will take care of updating the progress of the file as it's processed. 
    procFile.ProgressChanged += new ProgressChangedEventHandler((object obj, ProgressChangedEventArgs ev) =>
    {
        // Progress 4: Do something with the value, such as update a ProgressBar. 
        // ....
    }
    */

    procFile.RunWorkerCompleted += new RunWorkerCompletedEventHandler((object obj, RunWorkerCompletedEventArgs ev) =>
    {
         // Do something with the result (ev.Result), bearing in mind, it is a StringBuilder and the ev.Result is an object. 
         StringBuilder result = ev.Result as StringBuilder; 

         // ....
    }
}

+++++ +++++ +++++ +++++

Time for the actual question... The above was a warm-up and to show a current level of understanding so I don't face these as prospective answers.

I'm pretty much doing the last code example given above (i.e. using a BackgroundWorker) and dumping the contents of what is read to a RichTextBox. Simple stuff really.

The problem I'm facing however is processing large files (e.g. ~222MB). The case being just taking a .txt, reading it, pushing the result of it built through a StringBuilder into the RichTextBox. It cannot load the file, I get an OutOfMemoryException. One way around this, which takes a considerable amount (and still doesn't load the file) is iterating through the string and adding each character (as a char) from the file StringBuilder.

I've always used the most basic and straightforward means of reading files (such as the examples given above), but does anyone have any guidance on how to improve on this? Ways of processing extremely large files? etc.

Even as a discussion piece, I'd welcome your ideas.

+++++ +++++ +++++ +++++

Edit 1 (@TaW): the exception was thrown when trying to put the string into the RichTextBox...

FileProcessing.RunWorkerCompleted += new RunWorkerCompletedEventArgs((object obj, RunWorkerCompletedEventArgs e) =>
{
    // 'Code' is the RichTextBox in question...

    Code.Text = "";

    if (e.Result is StringBuilder)
    {
        Code.Text = (e.Result as StringBuilder).ToString();
    }
}
user1092809
  • 107
  • 1
  • 7
  • Personally, I just use `File.ReadAllText(filename)` but then I'm lazy. –  Apr 21 '14 at 13:38
  • The short answer is that to load a file that's too big for memory, you simply can't load it all at once. Loading only the part of the file that currently in view in the scrolling control is a common solution. – Kendall Frey Apr 21 '14 at 13:41
  • 1
    Reading a 200MB file should take so little time that a progress bar and a background worker is overkill. – Lasse V. Karlsen Apr 21 '14 at 13:43
  • @Will: nothing wrong with that! ;) – user1092809 Apr 21 '14 at 13:45
  • @KendallFrey: I do appreciate that the file size example I gave is quite large so I can't simply load it all into memory (or perhaps shouldn't). Could you perhaps give a code example with the scrolling solution? – user1092809 Apr 21 '14 at 13:46
  • 200MB is not really large for a textfile you want to read into memory and process from there, imo. (Obviously depending on your machine) But building up a RTF from it may well take its time and resources. But where __exactly__ did you get the OutOfMemoryException? – TaW Apr 21 '14 at 13:51
  • @LasseV.Karlsen: You are more than welcome to use the code I've given above and see for yourself, then report back. What's more, if you read the question, I put the code examples to show some various ways in which a file might be read and features which may improve it. For a 200MB this would take some time, even if it took 10-30 seconds I would argue it's good UI design to inform a user what's going on. – user1092809 Apr 21 '14 at 13:52
  • A code sample is too long for me to write up right now, but if you use google, you may be able to find something. Keyword "virtualization". WPF actually has a virtualized listbox, but no textbox. – Kendall Frey Apr 21 '14 at 13:53
  • I would recommend WPF Virtualization (you can take a look here [WPF: Data Virtualization](http://www.codeproject.com/Articles/34405/WPF-Data-Virtualization)) together with some clever caching (cache some lines above and below the current lines viewed). This can result in multiple reads for the same piece of the file but you can load any file. – smiron Apr 21 '14 at 13:55
  • Also, for positioning inside the file you can use FileStream.Position. PS: see my previous comment – smiron Apr 21 '14 at 14:03
  • http://www.codeproject.com/Articles/35438/Asynchronous-stream-reader-with-progress-bar-suppo – User2012384 Apr 21 '14 at 14:07
  • I Think this is what you want.. – User2012384 Apr 21 '14 at 14:07
  • 1
    I believe there is no virtualization for a rich textbox because virtualization works best with IList, not IEnumerable (and strings could be thought of as IEnumerable). Behind the scenes the virtualization makes use of indexers so the entire collection doesn't need to be iterated to get to a given position (the section we want to display). Imagine trying to scroll backwards through an IEnumerable...not possible without starting over at the beginning. – Jason Down Apr 21 '14 at 14:51
  • @user1092809 Reading a file that is 200MB on my machine takes .6 seconds (+/-) when reading one line at a time, a progress bar here would disappear before the user reacts to it being there. – Lasse V. Karlsen Apr 21 '14 at 17:29

3 Answers3

2

Is there a restriction you have that requires you to use a RichTextBox as the control to display your content? This control is not virtualized and will cause you performance (and by the looks of it memory error) issues.

There are a family of document viewing controls that are better designed for displaying large documents. Various controls exists depending on your needs (fixed, flowing via page or scrolling). In addition, you get searching, printing, zooming and a few other features that are often useful for viewing large documents.

Jason Down
  • 21,731
  • 12
  • 83
  • 117
  • I'm actually developing a script editor (with highlighting and intelli-sense), which is now complete and required a RichTextBox so I could achieve the highlighting and other features. So they will be plain-text files. – user1092809 Apr 21 '14 at 16:00
0

Have you tried the MemoryMapped ,
Its pretty useful lib for handling large files

Mithun Ashok
  • 346
  • 1
  • 4
0

this is not about advanced reading but about hitting the capacity limits of (Winforms) controls. Maybe you can get it to work in WPF, but in Winforms neither a RichTextBox nor a TextBox can hold such a large amount of lines/text.

I advise you to redesign this to present the data to the users in smaller chunks. It is not that they would want to scroll through 100.000+ lines. Processing them in memory is not an issue; here 200MB is not large at all; you can for example easily search in it in memory etc..

TaW
  • 53,122
  • 8
  • 69
  • 111
  • The question I posed, whilst being asked through a problem (which indeed were the limitations of WinForm controls), was further techniques in file I/O with C# and a discussion. Hence why I provided many code examples, to demonstrate some of the techniques I'm familiar with and probably what the majority of others use. I completely agree with your second point. – user1092809 Apr 22 '14 at 13:11