1

We have a huge list (say 100,000) which needs to be converted to DataTable for SqlBulkcopy.

Can you guide what could be the fastest way to do this without using for loop? For now we are doing it like - in below code listDos is object list

using (var dataTable = new DataTable(dataTableName))
{
    dataTable.Locale = CultureInfo.CurrentCulture;
    var columns = new[]
        {
             new DataColumn("Id", typeof(int)),
             new DataColumn("FkId", typeof(int)),
             new DataColumn("Status", typeof(string)),
             new DataColumn("RecordFrom", typeof(DateTime))
        };

    dataTable.Columns.AddRange(columns);


    foreach (ObjectDo listDo in listDos)
    {
        var row = dataTable.NewRow();

        if (rebuildDo.Id != null) row["Id"] = rebuildDo.Id;

        if (rebuildDo.FkId!= null) row["FkId"] = rebuildDo.FkId;

        row["Status"] = rebuildDo.Status;

        row["RecordFrom"] = rebuildDo.RecordFrom;

        dataTable.Rows.Add(row);
    }

    return dataTable;
}
Oxygen
  • 831
  • 4
  • 17
  • 42
  • If you want linq please show a linq effort. Also the linq query will not have the syntax of a loop but it does loop (iterate) over all items.. – Gilad Green Jan 16 '18 at 14:20
  • 3
    why "without for loop"? that seems a very arbitrary restriction... since you're dealing with lots of data, it is expected that something somewhere is going to need to loop... – Marc Gravell Jan 16 '18 at 14:25
  • I'm voting to close this question as off-topic because there is no problem to solve (only how to improve) and perhaps belongs on another Stack Exchange site. Plus motives are not explained –  Jan 16 '18 at 14:26
  • @ScottHannen it's not the loop, it's the duplication of data that will cause problems. Especially if there are a lot of data. – Panagiotis Kanavos Jan 16 '18 at 14:26
  • @MickyD on the contrary, this is a significant concern when using SqlBulkCopy. If you have 10K items, creating a DataTable with them doubles the memory usage *and* creates a lot of temporary objects that have to be garbage collected. – Panagiotis Kanavos Jan 16 '18 at 14:28
  • why are you creating a collection of `ObjectDo` and iterating over that to create a `DataTable`? instead of this fill your `DataTable` first then no iteration will be needed. – Khairul Islam Jan 16 '18 at 14:29
  • @KhairulIslam that's already explained. The OP wants to use SqlBulkCopy, not to create a DataTable. SqlBulkCopy works with DataTables or IDataReader instances. DataTable is rather cumbersome as a generic data structure which means *avoiding* it is a good idea. The FastMember library can expose an IDataReader interface on top of any IEnumerable – Panagiotis Kanavos Jan 16 '18 at 14:30
  • @KhairulIslam most times, people don't *have* a `DataTable` - they have typed data. If the intent of the `DataTable` is just to let the list interact with `SqlBulkCopy`, then this requirement makes sense, and I would actively advocate *against* working in a `DataTable`-first model. It is 2018, after all. – Marc Gravell Jan 16 '18 at 14:31
  • @PanagiotisKanavos Perhaps. Question is arguably too broad; does not mention operational requirements or desired throughput; _"fastest way"_ generally never leads to answers showing actual benchmarks to back-up claims –  Jan 16 '18 at 14:31
  • @MickyD read the first line again. It's actually very specific. If you've actually worked with SqlBulkCopy, had to send a lot of data to the database or simply check [SqlBulkCopy](https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) you'd understand the issue immediatelly. It's no accident that Marc Gravel and I keep posting similar answers and comments. – Panagiotis Kanavos Jan 16 '18 at 14:34
  • @PanagiotisKanavos I understand that and respect both you and Marc but I can't help but think this is an example of ignoring the rules when it pleases –  Jan 16 '18 at 14:39
  • The underlying question seems quite valid to me, perhaps it just needs to be re-worded. Especially given that the answer is 3 times faster than the OPs method, and this is probably a somewhat common problem that could help many people. Sometimes rules are made to be broken. Other times, they are just guidelines. ;) – Rufus L Jan 17 '18 at 22:16
  • _[Can I post questions about optimizing code on Stack Overflow?....No, it isn't the best place - though it is on-topic, there's a better place for such questions. You can take working code to Code Review - but do make sure to read their help center to see what exactly they expect from a good question.](http://meta.stackoverflow.com/questions/261841/can-i-post-questions-about-optimizing-code-on-stack-overflow)_ –  Jan 18 '18 at 14:10

1 Answers1

8

The fastest way would be: not to.

SqlBulkCopy can take an IDataReader. "FastMember" (on NuGet) can expose a List<T> as an IDataReader:

List<ObjectDo> listDos = ...
using(var bcp = new SqlBulkCopy(connection)) 
using(var reader = ObjectReader.Create(listDos,
    "Id", "FkId", "Status", "RecordFrom")) 
{ 
  bcp.DestinationTableName = "SomeTable"; 
  bcp.WriteToServer(reader); 
}

This is now a direct reader over the list: no duplication of all the contents.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Thanks for the inputs. Tried this and it seems almost three times faster than going with SqlBulkCopy and data table.. Just for curiosity, why ObjectReader is faster? – Oxygen Jan 16 '18 at 16:15
  • @Oxygen possibly because it is simpler? `DataTable` has quite a lot of overhead, including the fact that it actually stores fields vertically rather than horizontally (which it does to avoid having to do lots of boxing) - so you actually get pretty poor memory locality. Note that without seeing a specific performance test: I can only speculate – Marc Gravell Jan 16 '18 at 16:19
  • @Mac. Need one more small help. I have miss match in class properties and table column names, is it mandatory that reader should match table properties? Also if table contains 10 columns, how can I create Object reader with only required 5 columns? – Oxygen Jan 17 '18 at 12:37
  • @Oxygen look at `bcp.ColumnMappings`. Add mapping between the desired source and destination columns. – Marc Gravell Jan 17 '18 at 12:48
  • Thank you so much for all your time and direction. – Oxygen Jan 17 '18 at 14:18