5

I am building a batch processing system. Batches of Units come in quantities from 20-1000. Each Unit is essentially a hierarchy of models (one main model and many child models). My task involves saving each model hierarchy to a database as a single transaction (either each hierarchy commits or it rolls back). Unfortunately EF was unable to handle two portions of the model hierarchy due to their potential to contain thousands of records.

What I've done to resolve this is set up SqlBulkCopy to handle these two potentially high count models and let EF handle the rest of the inserts (and referential integrity).

Batch Loop:

foreach (var unitDetails in BatchUnits)
{
  var unitOfWork = new Unit(unitDetails);
  Task.Factory.StartNew(() =>
    {
      unitOfWork.ProcessX(); // data preparation
      unitOfWork.ProcessY(); // data preparation
      unitOfWork.PersistCase();
    });
}

Unit:

class Unit
{
  public PersistCase()
  {
    using (var dbContext = new CustomDbContext())
    {
      // Need an explicit transaction so that 
      // EF + SqlBulkCopy act as a single block
      using (var scope = new TransactionScope(TransactionScopeOption.Required,
        new TransactionOptions() {
          IsolationLevel = System.Transaction.IsolationLevel.ReadCommitted
        }))
      {
        // Let EF Insert most of the records
        // Note Insert is all it is doing, no update or delete
        dbContext.Units.Add(thisUnit);
        dbContext.SaveChanges();  // deadlocks, DbConcurrencyExceptions here

        // Copy Auto Inc Generated Id (set by EF) to DataTables
        // for referential integrity of SqlBulkCopy inserts
        CopyGeneratedId(thisUnit.AutoIncrementedId, dataTables);

        // Execute SqlBulkCopy for potentially numerous model #1
        SqlBulkCopy bulkCopy1 = new SqlBulkCopy(...);
        ...
        bulkCopy1.WriteToServer(dataTables["#1"]);

        // Execute SqlBulkCopy for potentially number model #2
        SqlBulkCopy bulkCopy2 = new SqlBulkCopy(...);
        ...
        bulkCopy2.WriteToServer(dataTables["#2"]);

        // Commit transaction
        scope.Complete();
      }
    }
  }
}

Right now I'm essentially stuck between a rock and a hard place. If I leave the IsolationLevel set to ReadCommitted, I get deadlocks between EF INSERT statements in different Tasks.

If I set the IsolationLevel to ReadUncommitted (which I thought would be fine since I'm not doing any SELECTs) I get DbConcurrencyExceptions.

I've been unable to find any good information about DbConcurrencyExceptions and Entity Framework but I'm guessing that ReadUncommitted is essentially causing EF to receive invalid "rows inserted" information.

UPDATE

Here is some background information on what is actually causing my deadlocking issues while doing INSERTS:

http://connect.microsoft.com/VisualStudio/feedback/details/562148/how-to-avoid-using-scope-identity-based-insert-commands-on-sql-server-2005

Apparently this same issue was present a few years ago when Linq To SQL came out and Microsoft fixed it by changing how scope_identity() gets selected. Not sure why their position has changed to this being a SQL Server problem when the same issue came up with Entity Framework.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Jeff Swensen
  • 3,513
  • 28
  • 52

1 Answers1

3

This issue is explained fairly well here: http://connect.microsoft.com/VisualStudio/feedback/details/562148/how-to-avoid-using-scope-identity-based-insert-commands-on-sql-server-2005

Essentially its an internal EF issue. I migrated my code to use Linq To SQL and it now works fine (no longer does the unnecessary SELECT for the identity value).

Relevant quote from the exact same issue in Linq To Sql which was fixed:

When a table has an identity column, Linq to SQL generates extremely inefficient SQL for insertion into such a table. Assume the table is Order and the identiy column is Id. The SQL generated is:

exec sp_executesql N'INSERT INTO [dbo].[Order]([Colum1], [Column2]) VALUES (@p0, @p1)

SELECT [t0].[Id] FROM [dbo].[Order] AS [t0] WHERE [t0].[Id] = (SCOPE_IDENTITY()) ',N'@p0 int,@p1 int,@p0=124,@p1=432

As one can see instead of returning SCOPE_IDENTITY() directly by using 'SELECT SCOPE_IDENTITY()', the generated SQL performs a SELECT on the Id column using the value returned by SCOPE_IDENTITY(). When the number of the records in the table is large, this significantly slows down the insertion. When the table is partitioned, the problem gets even worse.

Jeff Swensen
  • 3,513
  • 28
  • 52