0

So I'm trying to get a comparison benchmark between CosmosDB and SQL Server. Presently, I've overloaded a SQL Server database by pumping a table full of test records, all 250 GB of it (yes, id is the primary key and has a clustered index). Now, I'm attempting to delete millions of records, as fast as possible, even hundreds of millions if I can (so far I've only been able to do about 10M an hour using a single thread).

First, I made a connection pool, like so:

List<SqlConnection> connectionPool = new List<SqlConnection>();
for (int i = 0; i < connectionPoolSize; i++)
    connectionPool.Add(new SqlConnection(myConnString));

Later, I created a list of tasks and fed each of them a connection. I attempted to get around table locking by keeping the number of records each would delete at 4999. Here we go with the tasks (I debugged through my other code, trust that I'm giving a valid range of row ids & a separate range to each task - I omitted that code):

for (int i = 0; i < connectionPool.Count; i++)
{
    Task task = new Task(() => DeleteRows(lowId, highId, dbConnection));
    task.Start();
    myTaskList.Add(task);
}

Later, I attempted to wait for any thrown exceptions. However, each task is showing as ran to completion, the handler below is not being triggered:

try { 
     Task.WaitAll( taskList.ToArray());
     foreach (Task task in myTaskList)
            {
                var result = task.Status;
            }
     }
catch (AggregateException e)
{
     Console.WriteLine(e.ToString());
}

And here's the delete method I'm calling in each task. It works when I single thread it, and the only exception I've ever seen is timeouts:

    static void DeleteRows(long lowId, long highId, SqlConnection connection)
    {
        if (connection.State != ConnectionState.Open)
        {
            connection.Open();
        }

        SqlTransaction transaction = connection.BeginTransaction("Transaction");
        var cmd = new SqlCommand(
            "delete from myTable where id > " + lowId + " and id < " + highId,
            connection,
            transaction)
        {
            CommandTimeout = connection.ConnectionTimeout
        };

        cmd.CommandTimeout = 300;
        cmd.ExecuteNonQuery();

        transaction.Commit();
    }

The transaction addition to the above is new. I have been casting around at this point. Either way, it's not working.

JakeJ
  • 2,361
  • 5
  • 23
  • 35
  • Does it have anything to do with it not running? I'm simply testing how fast I can make SQL Server churn and whether I can do multithreaded inserts, updates, and deletes. – JakeJ May 29 '18 at 20:34
  • Thank you. Yes, this code isn't going into production use and only represents an attempt to get performance and cost for various sizes of database at load. – JakeJ May 29 '18 at 20:43
  • 1
    The SQL Server is multi-threaded and locks data so there is never any conflicts between different processes accessing the same data at the same time. You must check the returned integer from the cmd.Execute to see the number of rows that has changed. if you are getting zero rows changed, then the query did not complete probably due to a conflict in the locking of the data had zero quantity. – jdweng May 29 '18 at 20:44
  • Anybody else wondering about parameterization: https://stackoverflow.com/questions/6413938/sql-delete-command – JakeJ May 29 '18 at 20:45
  • Too funny, just figured it out - I was calling another method to get the highest id, and throwing an exception to note to self if it wasn't changing. Turns out, that "<" rather than having a "<=" was the problem. There's big chunks of ids missing, but the top one remained, as well as one every roughly 5k records. Oh for the love of... – JakeJ May 29 '18 at 20:53
  • @JakeJ 1) add more console logging 2) I think we're missing your bug in code you haven't provided 3) Try TaskFactory.StartNew instead of new Task? 4) Is this async code at all? – TheSoftwareJedi May 29 '18 at 20:54
  • Ha, yeah, see #2 above.... – TheSoftwareJedi May 29 '18 at 20:54
  • Task.WaitAll(taskList.ToArray()) isn't async enough? lol – JakeJ May 29 '18 at 21:14
  • 1
    You are making a bad assumption that the IDs are actually getting removed, I would make sure you are not getting zero from cmd.ExecuteNonQuery(); – jdweng May 29 '18 at 21:52
  • Nice. Indeed I am, though pausing it for long enough to see it gives me timeouts. Running the comand on SSMS also returns 0 rows affected. Then changing it from a delete command to a select makes it take forever (or maybe it's permalocked), still haven't gotten the 4999 records back after a full minute: "select * from mytable where id >= 1199915322 and id <= 1199920321" – JakeJ May 29 '18 at 22:07
  • My major problem here is that I could never be sure when the table was going to be locked, and sometimes operations would take place, sometimes they would not. There was no certainty. I suppose that if I could add conditional logic on the sql command result and / or an extra query to see if the record range was still present, I could let it fail and retry. However, then I start to lose the advantage of being multithreaded. – JakeJ May 30 '18 at 14:21

0 Answers0