50

I am creating a method in C# which generates a text file for a Google Product Feed. The feed will contain upwards of 30,000 records and the text file currently weighs in at ~7Mb.

Here's the code I am currently using (some lines removed for brevity's sake).

public static void GenerateTextFile(string filePath) {

  var sb = new StringBuilder(1000);
  sb.Append("availability").Append("\t");
  sb.Append("condition").Append("\t");
  sb.Append("description").Append("\t");
  // repetitive code hidden for brevity ...
  sb.Append(Environment.NewLine);

  var items = inventoryRepo.GetItemsForSale();

  foreach (var p in items) {
    sb.Append("in stock").Append("\t");
    sb.Append("used").Append("\t");
    sb.Append(p.Description).Append("\t");
    // repetitive code hidden for brevity ...
    sb.AppendLine();
  }

  using (StreamWriter outfile = new StreamWriter(filePath)) {
      result.Append("Writing text file to disk.").AppendLine();
      outfile.Write(sb.ToString());
  }
}

I am wondering if StringBuilder is the right tool for the job. Would there be performance gains if I used a TextWriter instead?

I don't know a ton about IO performance so any help or general improvements would be appreciated. Thanks.

jessegavin
  • 74,067
  • 28
  • 136
  • 164
  • Since the time I wrote this question, the Linq2Csv project came to life. It is a much better way to handle the code I was writing. http://nuget.org/packages/LinqToCsv – jessegavin Apr 20 '12 at 13:55
  • any full source code with solution? – Kiquenet Aug 14 '12 at 09:19
  • Sorry, it was written for one of my clients. You should really look into Linq2Csv. It will make this sort of thing a lot easier. – jessegavin Aug 14 '12 at 14:27
  • Almost 5 years since my last comment on this question I would highly recommend CsvHelper. https://joshclose.github.io/CsvHelper/ – jessegavin Jan 11 '17 at 03:29

4 Answers4

81

File I/O operations are generally well optimized in modern operating systems. You shouldn't try to assemble the entire string for the file in memory ... just write it out piece by piece. The FileStream will take care of buffering and other performance considerations.

You can make this change easily by moving:

using (StreamWriter outfile = new StreamWriter(filePath)) {

to the top of the function, and getting rid of the StringBuilder writing directly to the file instead.

There are several reasons why you should avoid building up large strings in memory:

  1. It can actually perform worse, because the StringBuilder has to increase its capacity as you write to it, resulting in reallocation and copying of memory.
  2. It may require more memory than you can physically allocate - which may result in the use of virtual memory (the swap file) which is much slower than RAM.
  3. For truly large files (> 2Gb) you will run out of address space (on 32-bit platforms) and will fail to ever complete.
  4. To write the StringBuilder contents to a file you have to use ToString() which effectively doubles the memory consumption of the process since both copies must be in memory for a period of time. This operation may also fail if your address space is sufficiently fragmented, such that a single contiguous block of memory cannot be allocated.
LBushkin
  • 129,300
  • 32
  • 216
  • 265
  • Nice answer. Tuning may be tried using the StreamWriter constructor overload that lets you define the bufferSize... – João Aug 04 '10 at 15:58
  • Hey thanks for your answer! I appreciate you taking the time to add some further explanation about how to handle this sort of scenario. – jessegavin Aug 04 '10 at 16:34
  • 1
    5 years later... is the `FileStream` class still the best method of writing text files ~7MB ? – n00dles Oct 22 '15 at 15:45
27

Just move the using statement so it encompasses the whole of your code, and write directly to the file. I see no point in keeping it all in memory first.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
13

Write one string at a time using StreamWriter.Write rather than caching everything in a StringBuilder.

Alex Humphrey
  • 6,099
  • 4
  • 26
  • 41
  • 4
    I really hope you don't mean for him to write one *bit* at a time. – JSBձոգչ Aug 04 '10 at 15:43
  • While this was a good answer. I have a file that is approximately 20Mb in size and the issue I am facing is StreamWriter actually put's a carriage return/new line at the end. I am trying to remove that extra carriage return at the very end and as it was already pointed out StringBuilder is not a great solution for performance or size. I tried StreamReader.Peek() to peek the line before it reaches the end. Any ideas? – petrosmm Jul 30 '15 at 10:56
  • 2
    @MaximusPeters You probably found your way in the meantime, but perhaps you were using the `WriteLine()` method instead of `Write()`? – Stéphane Gourichon Jan 14 '16 at 19:37
4

This might be old but I had a file to write with about 17 million lines so I ended up batching the writes every 10k lines similar to these lines

for (i6 = 1; i6 <= ball; i6++) 
{ //this is middle of 6 deep nest ..
  counter++;
  // modus to get a value at every so often 10k lines
  divtrue = counter % 10000; // remainder operator % for 10k
  //  build the string of fields with \n at the end 
  lineout = lineout + whatever 
  // the magic 10k block here
  if (divtrue.Equals(0))  
  {
     using (StreamWriter outFile = new StreamWriter(@filepath, true))
     { 
         //  write the 10k lines with .write NOT writeline..
         outFile.Write(lineout); 
     } 
     // reset the string so we dont do silly like memory overflow
     lineout = ""; 
  }
}

In my case it was MUCH faster then one line at a time.

surfmuggle
  • 5,527
  • 7
  • 48
  • 77
mxdog
  • 45
  • 5