0

I have a console app that reads in a large text file with 40k+ lines, each line is a key that I use in a search for which the results are written to a output file. Issue is I leave this console app running for a while until it just suddenly closes and I realize that the process memory usage was really high was sitting at 1.6gb when I last saw it crash.

I looked around and didn't find many answers I did try to use the gcAllowVeryLargeObjects but that seems like I'm just dodging the problem.

Below is a snippet from my main() of where I write out to the file. I can't seem to understand why the memory usage gets so high. I flush the writer after every write (could it be because I'm keeping the file open for such a long period of time?).

TextWriter writer = new StreamWriter("output.csv", false));
foreach (var item in list)
 {
  Console.WriteLine("{0}/{1}", count, numofitem);
  var result = TableServiceContext.Read(p.id);
  if (result != null)
  {

   writer.WriteLine(String.Join(",", result.id,
   result.code,
   result.hash));

  }
  count++;
  writer.Flush();
 }
 writer.Close();

Edit: I have 32gb of ram on my computer so I am sure it's not running out of memory because I don't have enough ram.

Edit2: changed the name of the repository as that was misleading.

Thao Nguyen
  • 901
  • 7
  • 22
  • 42
  • Use a memory profiler to see what is 'stuck' – leppie Dec 17 '14 at 05:30
  • What is the typical length of one line? If it is 1KB then 40K lines is 40MB, and it is nothing. That's why, I'm pretty sure problem is in your repository class. If it is EF repository, try to recreate DbContext for each line – omikad Dec 17 '14 at 05:41
  • @omikad it is just a guid, if it helps I am querying a azure table storage using lokad-cloud. – Thao Nguyen Dec 17 '14 at 05:44
  • Try to put timestamps to Console output, you can use Stopwatch class, and try to recreate your repository each 10 or 100 or N lines. Then, looking at timestamps, you can find optimal N to use – omikad Dec 17 '14 at 05:51
  • @omikad I'm actually not using a repository at all (might be good or bad thing actually) I just query directly with the guid (partition-key). But I took you advice and recreated the TableServiceContext each time seem to speed it up. – Thao Nguyen Dec 17 '14 at 05:57
  • Great! I'm new to the StackOverflow, should I create an answer and will mark it as answer? – omikad Dec 17 '14 at 05:59

2 Answers2

1

If the average line length is 1KB then 40K lines is 40MB, and it nothing. That's why, I'm pretty sure problem is in your repository class. If it is EF repository, try to recreate DbContext for each line.

If you want to tune up your program, then, you can use the following method: Try to put timestamps to Console output, you can use Stopwatch class, and try to recreate your repository each 10 or 100 or N lines. Then, looking at timestamps, you can find optimal N to use.

var timer = Stopwatch.StartNew();
...
Console.WriteLine(timer.ElapsedMilliseconds);
omikad
  • 979
  • 8
  • 10
0

From looking at the code I think the problem isN't the Streamwriter but some memory leak in your repository. Suggestions to check:

  • replace the repository by some dummy e.g. class dummy_repository with just the three properties id, value, hash.
  • likewise create a long "list" e.g. 40k small entries.
  • run your program and see if it still consumes memory (I am pretty sure it will not)
  • then step by step add back your original parts. See what step causes the memory leak.
DrKoch
  • 9,556
  • 2
  • 34
  • 43