-2

I get some problems with c# windows form.
My goal is to slice a big file(maybe>5GB) into files,and each file contains a million lines.
According to the code below,I have no idea why it will be out of memory.

Thanks.

StreamReader readfile = new StreamReader(...);
StreamWriter writefile = new StreamWriter(...);    
string content;
while ((content = readfile.ReadLine()) != null)
{
    writefile.Write(content + "\r\n");
    i++;
    if (i % 1000000 == 0)
    {
        index++;
        writefile.Close();
        writefile.Dispose();
        writefile = new StreamWriter(...);  
    }
    label5.Text = i.ToString();
    label5.Update();
}
xanatos
  • 109,618
  • 12
  • 197
  • 280
yin
  • 13
  • 3
  • 3
    The file could have an extra-long line... If for example the file has a line long 100mb, then you could have a problem. – xanatos Aug 13 '15 at 09:55
  • 1
    xanatos is right. Print the line length to the console (`if (length > 100000) print(length)`). – usr Aug 13 '15 at 09:58
  • use `Read` method. specify amount of characters to read:`char[] ch = new char[100000];` then fill it with `readfile.Read(ch, 0, ch.Length);`. finally write the filled char[] (`ch`) into file – M.kazem Akhgary Aug 13 '15 at 10:01
  • @xanatos But there is no extra-long line in my file. Each line will just contains 100~200 words. – yin Aug 13 '15 at 10:02
  • 1
    how do you know this. you say each file contains a million lines? or maybe you generated those files your self? – M.kazem Akhgary Aug 13 '15 at 10:04
  • @yin Then try commenting out the `label5.Text = ` and `label5.Update()`... Updating the screen so fast is useless... You want to update it once a second or so. – xanatos Aug 13 '15 at 10:04
  • Would a writeFile.Flush() every few iterations help? – GinjaNinja Aug 13 '15 at 10:21
  • WIth strings being immutable, could it be due to all the lines being stored in memory and not garbage collected? Would changing 'content' to a StringBuilder resolve it? – Iain Ward Aug 13 '15 at 10:37
  • the current string could be garbage collected at the end of the `while` block, it might be a memory fragmentation problem – thumbmunkeys Aug 13 '15 at 10:47
  • There is no fragmentation if the heap is nearly empty. The problem has been identified already: long lines. The OP refuses to debug this. Therefore I'm closing because we can't answer without cooperation. – usr Aug 13 '15 at 11:19
  • 2
    I'm voting to close this question as off-topic because the OP is refusing to try tout theories that were presented. No non-guess answers are possible. – usr Aug 13 '15 at 11:20
  • @usr ,M.kazem Akhgary : Sorry to make you angry,but I am not refusing trying. I executed the program many times,and it breaked at different "i". So I thought that is no error with the data. But I rushed to go home so I forgot to explain my reason. I generated the file from another software. Maybe I too believe in it. I am not with my computer now. I will check it tomorrow. Thanks for your reply. – yin Aug 13 '15 at 12:14
  • @GinjaNinja : I have tried that yesterday but it doesn't work. So I delete it. Thanks. – yin Aug 13 '15 at 12:21
  • @xanatos : I try commenting out the label and it is successful! And the efficiency get better. But I don't know why it will out-of-memory and didn't find any information with it. Does anyone know? Thanks a lot! – yin Aug 14 '15 at 02:23
  • I don't think its efficient to update a label per system readline because readline is ridiculously fast (Can reach about 5000 rows per second). Normally you keep the progress status in a public property populated by a background worker, then set the value to the label per timer tick and not per iteration of the system. – ken lacoste Aug 14 '15 at 06:28

1 Answers1

1

The error is probably in the

label5.Text = i.ToString();
label5.Update();

just to make a test I've written something like:

for (int i = 0; i < int.MaxValue; i++)
{
    label1.Text = i.ToString();
    label1.Update();
}

The app freezes around 16000-18000 (Windows 7 Pro SP1 x64, the app running both x86 and x64).

What probably happens is that by running your long operation in the main thread of the app, you stall the message queue of the window, and at a certain point it freezes. You can see that this is the problem by adding a

Application.DoEvents();

instead of the

label5.Update();

But even this is a false solution. The correct solution is moving the copying on another thread and updating the control every x milliseconds, using the Invoke method (because you are on a secondary thread),

For example:

public void Copy(string source, string dest)
{
    const int updateMilliseconds = 100;

    int index = 0;
    int i = 0;
    StreamWriter writefile = null;

    try
    {
        using (StreamReader readfile = new StreamReader(source))
        {
            writefile = new StreamWriter(dest + index);

            // Initial value "back in time". Forces initial update
            int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);

            string content;
            while ((content = readfile.ReadLine()) != null)
            {
                writefile.Write(content);
                writefile.Write("\r\n"); // Splitted to remove a string concatenation
                i++;

                if (i % 1000000 == 0)
                {
                    index++;
                    writefile.Dispose();
                    writefile = new StreamWriter(dest + index);

                    // Force update
                    milliseconds = unchecked(milliseconds - updateMilliseconds);
                }

                int milliseconds2 = Environment.TickCount;

                int diff = unchecked(milliseconds2 - milliseconds);

                if (diff >= updateMilliseconds)
                {
                    milliseconds = milliseconds2;
                    Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1}", index, i)));
                }
            }
        }
    }
    finally
    {
        if (writefile != null)
        {
            writefile.Dispose();
        }
    }

    // Last update
    Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1} Finished", index, i)));
}

and call it with:

var thread = new Thread(() => Copy(@"C:\Temp\lst.txt", @"C:\Temp\output"));
thread.Start();

Note how it will write the label5 every 100 milliseconds, plus once at the beginning (by setting the initial value of milliseconds "back in time"), each time the output file is changed (by setting the value of milliseconds "back in time") and after having disposed everything.

An even more correct example can be written by using the BackgroundWorker class, that exists explicitly for this scenario. It has an event, ProgressChanged, that can be subscribed to update the window.

Something like this:

private void button1_Click(object sender, EventArgs e)
{
    BackgroundWorker backgroundWorker = new BackgroundWorker();
    backgroundWorker.WorkerReportsProgress = true;
    backgroundWorker.ProgressChanged += backgroundWorker_ProgressChanged;
    backgroundWorker.RunWorkerCompleted += backgroundWorker_RunWorkerCompleted;
    backgroundWorker.DoWork += backgroundWorker_DoWork;
    backgroundWorker.RunWorkerAsync(new string[] { @"C:\Temp\lst.txt", @"C:\Temp\output" });
}

private void backgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
    BackgroundWorker worker = sender as BackgroundWorker;

    string[] arguments = (string[])e.Argument;
    string source = arguments[0];
    string dest = arguments[1];

    const int updateMilliseconds = 100;

    int index = 0;
    int i = 0;
    StreamWriter writefile = null;

    try
    {
        using (StreamReader readfile = new StreamReader(source))
        {
            writefile = new StreamWriter(dest + index);

            // Initial value "back in time". Forces initial update
            int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);

            string content;
            while ((content = readfile.ReadLine()) != null)
            {
                writefile.Write(content);
                writefile.Write("\r\n"); // Splitted to remove a string concatenation
                i++;

                if (i % 1000000 == 0)
                {
                    index++;
                    writefile.Dispose();
                    writefile = new StreamWriter(dest + index);

                    // Force update
                    milliseconds = unchecked(milliseconds - updateMilliseconds);
                }

                int milliseconds2 = Environment.TickCount;

                int diff = unchecked(milliseconds2 - milliseconds);

                if (diff >= updateMilliseconds)
                {
                    milliseconds = milliseconds2;
                    worker.ReportProgress(0, new int[] { index, i });
                }
            }
        }
    }
    finally
    {
        if (writefile != null)
        {
            writefile.Dispose();
        }
    }

    // For the RunWorkerCompleted
    e.Result = new int[] { index, i };
}

void backgroundWorker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
    int[] state = (int[])e.UserState;
    label5.Text = string.Format("File {0}, line {1}", state[0], state[1]);
}

void backgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
    int[] state = (int[])e.Result;
    label5.Text = string.Format("File {0}, line {1} Finished", state[0], state[1]);
}
xanatos
  • 109,618
  • 12
  • 197
  • 280