I am working on a C# program that reads in very large files and is checking them for different attributes and fields. I had been testing with files with under 1 million lines and it was preforming as expected. I have recently tested it on a file with 2.5 million lines and it took 4 hours to run through.
I am using a custom Reading function to read in each character so that I can find all CR and LF because it is very important that every line contains them. I have tested the Reading function separately and it look about 14 minutes to read the file, which I find reasonable enough to read every character in a 2.5 million lines with 1500 characters. I will included me Reading function, however this doesn't seem to be causing the issue.
My reading function adds each character to a string and then I check different values in the string. For example, is line length is correct, does file contains a header, and does the header contain the correct values. As well as specific values like is char position 403-404 a number, is field 1250-1300 not null, etc.
My question is what can I do to figure out what is causing the slow down and increase my efficiency of my program? I have tried checking the time at the beginning and end of each line loop and it doesn't seem to change. However, every 100,000 takes significantly longer than the previous. As an example, processing line 10,000 to 20,000 took less than 3 seconds and 830,000 to 840,000 took about 35 seconds. I have considered trying to multiple threads but don't think it will help in my case with reading lines from a file. Thoughts? Thanks for the help!
static void ReadMyLine(ref string currentLine, string filePath, ref int asciiValue, ref Boolean isMissingCR, ref Boolean isMissingLF, ref Boolean isReversed, ref StreamReader file)
{
Boolean endOfRow = false;
isMissingCR = false;
isMissingLF = false;
isReversed = false;
currentLine = "";
while (endOfRow == false)
{
asciiValue = file.Read();
if (asciiValue == 10 || asciiValue == 13)
{
int asciiValueTemp = file.Peek();
if (asciiValue == 13 && asciiValueTemp == 10)
{
endOfRow = true;
asciiValue = file.Read();
}
else if (asciiValue == 10 && asciiValueTemp == 13) // CRLF Reversed
{
asciiValue = file.Read();
endOfRow = true;
isReversed = true;
}
else if (asciiValue == 10) // Missing CR
{
isMissingCR = true;
endOfRow = true;
}
else if (asciiValue == 13) // Missing LF
{
isMissingLF = true;
endOfRow = true;
}
else
endOfRow = true;
}
else if (asciiValue != -1)
currentLine += char.ConvertFromUtf32(asciiValue);
else
endOfRow = true;
}
}