1

We are experiencing an issue where several hundred threads are trying to update a table ID, similar to this post, and sometimes encountering errors such as:

Cannot insert duplicate key in object dbo.theTable. The duplicate key value is (100186).

The method that is being executed hundreds of times in parallel executes several stored procedures:

using (var createTempTableCommand = new SqlCommand())
{
    createTempTableCommand.CommandText = createTempTableScript;
    createTempTableCommand.Connection = omniaConnection;
    createTempTableCommand.ExecuteNonQuery();
}

foreach (var command in listOfSqlCommands)
{
    using (var da = new SqlDataAdapter(command))
    {
        da.Fill(dtResults);
    }
}

In order to recreate such an environment/scenario, is it advisable to simply record a trace and then simply replay it?

How do we recreate an environment with high concurrency?

Community
  • 1
  • 1
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062

2 Answers2

2
  1. You can avoid all deadlocks/dirty read only when you will rewrite your solution into sequencial instead of paralel.
  2. You can accept some error and create appropriate error handling. Blocked or wrong run with duplicate key can be started again.
  3. You can try rewrite your solution without touching the same rows with more thread at the same time. You have to change your transaction isolation level (https://msdn.microsoft.com/en-us/library/ms709374(v=vs.85).aspx), change your locking to rowlocking (probably combination of ROWLOCK, UPDLOCK hints). This solution will minimalize your errors, but cannot handle all errors.

So I recommend 2. In some solutions is better way to run command without transation - you can handle it without blocking other threads and enforce relations in next step.

And for "similar post" - the same way. Error handling will be better in your app. Prevent to use cursor solutions like in similar post, because in goes against database fundamendals. Collect data into sets and use sets.

Deadsheep39
  • 561
  • 3
  • 16
1

I don't think tracing is a good approach to reproducing a high concurrency environment, because the cost of tracing will itself skew the results and it's not really designed for that purpose. Playback won't necessarily be faithful to the timing of the incoming events.

I think you're better off creating specific load tests to hopefully exercise the problems you're encountering, rent some virtual machines and beat the heck out of a load-test db.

Having said that, tracing is a good way to discover what the actual workload is. Sometimes, you're not seeing all the activity that's coming against your database. Maybe there are some "oh yeah" jobs running when the particular problems present themselves. Hundreds of possibilities I'm afraid - and not something that can be readily diagnosed without a lot more clues.

Clay
  • 4,999
  • 1
  • 28
  • 45
  • great! could you recommend some specific frameworks for this type of testing? and perhaps some manual ways to test>? – Alex Gordon Dec 25 '16 at 17:33