-1

I am loading about 4 million records from the data base and processing them one row at a time. To do so, I am iterating through each data row of the data set which is resulting in issues with efficiency.

How can I make this faster? I tried using a Parallel.Foreach loop. However, since I am dealing with Datarows, which aren't thread safe, implementing the lock block around the places where I am writing to datarows worsened the efficiency. The logic is about 3k lines long. So each row is being run through 3k lines.

Is there any way to make it more efficient? I was thinking about using a List of DataRows instead of using DataRows itself. If I make that change, will I be able to use Parallel.Foreach loop and expect better efficiency?

Or should I create an Entity class for the report and create a List of that entity class? Which one would be faster? List of System.DataRow or List of Entity class?

I do understand that this is a design problem, but there isn't much I can do in terms of that. I'd appreciate any kind of help. Thank you.

2 Answers2

1

I would recommend using datareader to deal with one row at a time instead of loading 4 million rows into a datatable. That should considerably speed up the loading process.

To speed up processing you could make the C# 3000 lines code an async method that is called it for each row. Use Task.WhenAll to run them in parallel. You can up System.Net.ServicePointManager.DefaultConnectionLimit to maximize degree of parallelism.

Here is what it may look like

public async Task LoadData(SqlConnection connection)
    {
        using (connection)
        {
            SqlCommand command = new SqlCommand(
              "SELECT colOne, colTwo, colThree FROM SqlTable;",
              connection);
            connection.Open();

            SqlDataReader reader = command.ExecuteReader();
            var tasks = new List<Task>();

            if (reader.HasRows)
            {
                while (reader.Read())
                {
                    tasks.Add(ProcessData(reader.GetInt32(0), reader.GetInt32(1), reader.GetInt32(2)));
                }

                await Task.WhenAll(tasks);
            }
            else
            {
                Console.WriteLine("No rows found.");
            }
            reader.Close();
        }
    }

    public async Task ProcessData(int one, int two, int three)
    { 
        // lots of processing

        // open a new connection to database to update row and close it.
    }

Hope that helps.

Shahid Syed
  • 589
  • 3
  • 15
  • thank you for your response. Isn't a datareader a read-only forward-only cursor? I also need to write some data into the dataset. For ex: Let's say my datatable has 3 columns, based on the combination of the 3 column values, I need to create a new column in the datatable and assign a value to that column. (Read and Write) Also, I believe that it won't be really a good practice to have an open connection to the database for let's say 40 minutes since DataReader requires an open connection (current runtime) – Ashish Samant Jun 18 '20 at 15:39
  • 1
    I missed the part about writing back. Are you saving the new column back to database or using only in UI? – Shahid Syed Jun 18 '20 at 15:46
  • The report is being called from 3 separate modules. In two modules we're displaying it in the UI, but in the third one, we're writing it back to the database. – Ashish Samant Jun 19 '20 at 08:15
0

DataTables are "safe for multithreaded read operations. You must synchronize any write operations."

https://learn.microsoft.com/en-us/dotnet/api/system.data.datatable?view=netcore-3.1

But much better if you can avoid loading 4M rows into a DataTable. If you can get the results ordered by the database, you can perform grouping efficiently as you initially load the data.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Hi David. Apart from just loading the data in a datatable, I also need to create let's say a new column in the datatable based and assign it a value based on the values present in the datatable (loaded from the db). I couldn't really understand what you meant in the last line, could you please elaborate? – Ashish Samant Jun 18 '20 at 15:43
  • Define a DataTable or Class not for the results of the query, but for what the report needs. Then use a DataReader to stream and group and calculate the results as you read them from the source. – David Browne - Microsoft Jun 18 '20 at 17:55