5

This question is NOT about parsing a CSV.

Using the following code to create a CSV from a DataTable
But it is slow
100 rows by 14 columns is 4 seconds
Is there a faster way?

StringBuilder sb = new StringBuilder();
bool first = true;
int colCount = 0;
foreach (DataColumn dc in DT.Columns)
{
    if (first) first = false; else sb.Append(",");
    sb.Append("\"" + dc.ColumnName +  "\"");
    colCount++;
}
sb.AppendLine();
foreach (DataRow dr in DT.Rows)
{  
    first = true;
    for (int i = 0; i < colCount; i++)
    {
        if (first) first = false; else sb.Append(",");
        sb.Append("\"" + dr[i].ToString().Trim() + "\"");
    }
    sb.AppendLine();
}
return sb.ToString();

StringBuilder is not the problem here.
Load i from 0 to 1 million runs in 300 milliseconds

StringBuilder sb = new StringBuilder();
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
    sb.Append(i.ToString());
}
sw.Stop();
Debug.Write(sw.ElapsedMilliseconds.ToString());
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • 1
    @DourHighArch The question is creating a CSV. NOT parsing. There is nothing in this question about parsing a CSV. – paparazzo Feb 14 '13 at 20:09
  • 1
    My only comment would be that you are writing your output into a StringBuilder - why not write directly to a stream? – Matt Feb 14 '13 at 20:31
  • Not sure how much you would gain, but you could eliminate the string expressions, i.e. change `sb.Append("\"" + dc.ColumnName + "\"");` to `sb.Append('"'); sb.Append(dc.ColumnName); db.Append('"');`. Same in your second loop. Also, single character literals may be faster than strings, where applicable. Finally (nit) `first` in your second loop can be eliminated in favor of a check against `i==0`. – 500 - Internal Server Error Feb 14 '13 at 21:55
  • @Matt I take out SB calls and the time is the same. Loading a SB with 40,000 characters is not what is taking 4 seconds. – paparazzo Feb 14 '13 at 21:56
  • @500-InternalServerError But would not (i==0) take longer than (first) – paparazzo Feb 14 '13 at 22:11
  • 1
    No, at the CPU level the complexity of the two is equivalent. – 500 - Internal Server Error Feb 14 '13 at 22:13
  • @500-InternalServerError I tested and you are correct – paparazzo Feb 14 '13 at 22:42
  • Apologies @Blam, I misunderstood your question. You may remove the proposed answer, or leave it as a warning to others who may not read the questions correctly. – Dour High Arch Feb 14 '13 at 23:09
  • that does seem kinda odd that it's taking that long.. What exactly are in those columns?? massive blocks of data that require transformation or just simple int,string, columns. some example of the data itself might help. I have a project where I write CSV's from IDataReaders and it's reasonable even with millions of rows. – Paul Farry Feb 15 '13 at 04:04
  • @PaulFarry One report that is just 11053 characters takes 2.7 seconds. Yes I know DataReader is much faster. That is why I am confused with this speed. – paparazzo Feb 15 '13 at 09:45
  • 1
    Haven't got an answer for you, but I mocked up some code and I found no issue with the datatable stuff. Are you sure your DataTable is the actual problem and not something else in your application maxing out the CPU/IO and not giving your code enough of a chance to run?.. the example i did is here https://gist.github.com/PaulFarry/6e5e1e80f484cfad1218 – Paul Farry Feb 15 '13 at 12:11
  • @PaulFarry Yes your sample does run fast on my PC. Did not test it on the server. Not sure what is going on. It Fills from a DataAdapter but that should not make a difference. It is running on a server that is at high CPU but other parts of the app are responsive. I will reboot and test again when the server is idle. – paparazzo Feb 15 '13 at 14:56

2 Answers2

11

There are faster ways of doing string concatenation and some of the other logic. Using the string builder to build up the whole thing may be part of the slowdown too. A library would have some of these things thought out already and would probably perform faster, as some have suggested.

Here is some code using CsvHelper (which I wrote).

using( var dt = new DataTable() )
{
    dt.Load( dataReader );
    foreach( DataColumn column in dt.Columns )
    {
        csv.WriteField( column.ColumnName );
    }
    csv.NextRecord();

    foreach( DataRow row in dt.Rows )
    {
        for( var i = 0; i < dt.Columns.Count; i++ )
        {
            csv.WriteField( row[i] );
        }
        csv.NextRecord();
    }
}

If you have the DataReader you don't even have to use the DataTable then, which should speed it up some too.

var hasHeaderBeenWritten = false;
while( dataReader.Read() )
{
    if( !hasHeaderBeenWritten )
    {
        for( var i = 0; i < dataReader.FieldCount; i++ )
        {
            csv.WriteField( dataReader.GetName( i ) );
        }
        csv.NextRecord();
        hasHeaderBeenWritten = true;
    }

    for( var i = 0; i < dataReader.FieldCount; i++ )
    {
        csv.WriteField( dataReader[i] );
    }
    csv.NextRecord();
}
Josh Close
  • 22,935
  • 13
  • 92
  • 140
  • 3
    Thanks for the example, its missing the `using (var csv = new CsvWriter(writer))` bit which might confuse some people – codeulike Dec 09 '19 at 16:14
-2

What you have there isn't "great" code...I think most people would suggest using something like the CSVHelper nuget package. However, I'm also going to say that that code isn't what is taking 4 seconds with only 100 rows. How long does it take to get the data in the data table? I'd guess that that is where most of the 4 seconds is spent.

Al W
  • 7,539
  • 1
  • 19
  • 40
  • 4 seconds is the time to extract text from an EXISTING DataTable. Not the time to create the DataTable. And show me how CSVHelper reads a DataTable. – paparazzo Feb 15 '13 at 00:05