5

I have the following (simplified) code which I'd like to optimise for speed:

long inputLen = 50000000; // 50 million 
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
    objectRow[0] = ...
    objectRow[1] = ...
    objectRow[2] = ...

    // Generate output for this input
    output = ...

    for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
    {
         objectRow[3] = output[i];
         dataRow = dataTable.NewRow();
         dataRow.ItemArray = objectRow;
         dataTable.Rows.Add(dataRow);
    }
}

// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();

I'm already using SQLBulkCopy in an attempt to speed things up, but it appears assigning values to the DataTable itself proves to be slow.

I don't know how DataTables work so I'm wondering if I'm creating unnecessary overhead by first creating a reusable array, then assigning it to a DataRow, then adding the DataRow to the DataTable? Or is using DataTable not optimal in the first place? The input comes from a database.

I don't care much about LOC, just speed. Can anyone give some advice on this?

David Tang
  • 92,262
  • 30
  • 167
  • 149
  • For an example implementation, look at `SimpleDataReader` [here](https://groups.google.com/group/microsoft.public.dotnet.languages.csharp/msg/b1d70b504cdee2ad?hl=en) – Marc Gravell Feb 01 '11 at 08:20

3 Answers3

13

For such a big table, you should instead use the

public void WriteToServer(IDataReader reader)

method.

It may mean you'll have to implement yourself a "fake" IDataReader interface with your code (if you' don't get the data from an existing IDataReader), but this way, you'll get "streaming" from end to end, and will avoid a 200 million loop.

casperOne
  • 73,706
  • 19
  • 184
  • 253
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
  • Does that mean SQLBulkCopy can still write efficiently to the database **as** the rows are being constructed? The input is coming from the same database, but for each row I'm potentially creating 20000 new ones. – David Tang Feb 01 '11 at 08:16
  • 1
    @Box9: Yes, the data doesn't need to be there before you call SQLBulkCopy. We had a "Legacy data take on" application here that would port 1 million rows in about an hour (BCP out 1 table, format several bcp files, bcp those files in). We recently needed it to run on about 44 million rows, I got the 1 hour down to 10 minutes by implementing several IDataReader that read data from the source DB and formatted the rows as SQLBulkCopy asked for them. – Binary Worrier Feb 01 '11 at 08:21
4

Instead of holding a huge data table in memory, I would suggest implementing a IDataReader which serves up the data as the bulk copy goes. This will reduce the need to keep everything in memory upfront, and should thus serve to improve performance.

Peter Lillevold
  • 33,668
  • 7
  • 97
  • 131
0

You should not construct entire datatable in memory. Use this overload of WrtieToServer, that takes array of DataRow. Just split in chunks your data.

gor
  • 11,498
  • 5
  • 36
  • 42
  • Isn't that method still constructing it in memory? And besides, if I'm not running out of memory, isn't constructing it in memory the fastest? – David Tang Feb 01 '11 at 08:13
  • if you construct only 1k rows at one time instead of 200m it would be much easier for memory manager to hold such amount of data. besides this, your almost certainly will get swapping with 200m of records hold in memory at one time – gor Feb 01 '11 at 08:16