-2

I am inputting a text file into a DataTable and then using SqlBulkCopy to copy to a Database. While BulkCopy is fast, inserting 50000+ lines into DataTable is not (around 5 mins). How do I make it efficient?

Can I insert data into the DataTable quickly? If not, is there a way to save the inserted data permanently into the DataTable so I don't have to insert it every time I run the program?

    for (; i < fares.Length; )
            {
                k = i;
                Console.WriteLine("Inserting " + k + " out of " + (fares.Length));
                for (; i <= (k + 3); i++)
                {
                    if (i % 4 == 0)
                    {
                        for (int j = 0; j < fares.Length - 1; j++)
                        {
                            {
                                int space = fares[i].IndexOf(" ");
                                startStation = fares[i].Substring(0, space);

                                endStation = fares[i].Substring(space + 1, fares[i].Length - space - 1);

                            }
                        }
                    }

                    else if (i % 4 == 1)
                    {
                        valueFare = fares[i];
                    }
                    else if (i % 4 == 2)
                    {
                        standardFare = fares[i];

                    }
                    else if (i % 4 == 3)
                    {
                        time = int.Parse(fares[i]);
                    }


                }

                faresDT.Rows.Add(startStation, endStation, valueFare, standardFare, time);
zane
  • 35
  • 1
  • 4
  • 50k lines isn't that much. Where's your code that creates the `Datatable`? – EdSF Jan 27 '19 at 02:57
  • 1
    Possible duplicate of [Is there a way to use SqlBulkCopy without converting the data to a DataTable?](https://stackoverflow.com/questions/1760172/is-there-a-way-to-use-sqlbulkcopy-without-converting-the-data-to-a-datatable) – mjwills Jan 27 '19 at 03:03
  • I would get rid of the data table completely and just read the file directly into the sqlbulkcopy, but without more example code, it’s hard to help – Tim Robinson Jan 27 '19 at 03:05
  • I just added the code for inserting the data into DataTable – zane Jan 27 '19 at 03:08
  • That code is doing more than insert to a DataTable. Writing to the console 50,000 times will take some time as will that parsing. All in all you are asking the wring questions – Ňɏssa Pøngjǣrdenlarp Jan 27 '19 at 03:42

2 Answers2

0

If what you want is to optimize your load to the database, I suggest that you get rid of the DataTable completely. By making use of Marc Gravell's FastMember (and anyone who's using SqlBulkCopy should be using FastMember IMHO) you can get a DataReader directly from any IEnumerable.

I would use some variation of the below code whenever writing from a file directly to a database. The below code will stream the contents of the file directly to the SqlBulkCopy operation thru the clever use of yield returns and lazy load of IEnumerable.

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.IO;
using System.Text;
using FastMember;

namespace BulkCopyTest
{
    public class Program
    {
        public static void Main(string[] args)
        {
            const string filePath = "SOME FILE THAT YOU WANT TO LOAD TO A DB";

            WriteData(GetData<dynamic>(filePath));
        }

        private static void WriteData<T>(IEnumerable<T> data)
        {
            using (var bcp = new SqlBulkCopy(GetConnection(), SqlBulkCopyOptions.TableLock, null))
            using (var reader = ObjectReader.Create(data))
            {
                SetColumnMappings<T>(bcp.ColumnMappings);
                bcp.BulkCopyTimeout = 300;
                bcp.BatchSize = 150000;
                bcp.DestinationTableName = ""; //TODO: Set correct TableName
                bcp.WriteToServer(reader);
            }
        }

        private static void SetColumnMappings<T>(SqlBulkCopyColumnMappingCollection mappings)
        {
            //Setup your column mappings
        }


        private static IEnumerable<T> GetData<T>(string filePath)
        {
            using (var fileStream = File.OpenRead(filePath))
            using (var reader = new StreamReader(fileStream, Encoding.UTF8))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    //TODO: Add actual parsing logic and whatever else is needed to create an instance of T
                    yield return Activator.CreateInstance<T>();
                }
            }
        }

        private static SqlConnection GetConnection()
        {
            return new SqlConnection(new SqlConnectionStringBuilder
            {
                //TODO: Set Connection information here
            }.ConnectionString);
        }
    }
}
Tim Robinson
  • 445
  • 3
  • 7
-1

In this case I think you should take advantage of the BeginLoadData, LoadDataRow and EndLoadData methods provided in the DataTable class, you could use them like this:

try
{
    faresDT.BeginLoadData();

    // Your for loop...
    {
        // Logic defining the value of startStation, endStation, valueFare, standardFare and time removed for briefness.
        faresDT.LoadDataRow(new object[] {startStation, endStation, valueFare, standardFare, time}, true);
    }
}
finally
{
    faresDT.EndLoadData();
}

What BeginLoadData() does is turning off some processing that happens every time you add a row, and only does it once when you are done loading data by calling EndLoadData().

You can find more details about these APIs here: https://learn.microsoft.com/en-us/dotnet/api/system.data.datatable.loaddatarow?view=netframework-4.7.2

yv989c
  • 1,453
  • 1
  • 14
  • 20