I am building a batch processing system. Batches of Units
come in quantities from 20-1000. Each Unit
is essentially a hierarchy of models (one main model and many child models). My task involves saving each model hierarchy to a database as a single transaction (either each hierarchy commits or it rolls back). Unfortunately EF
was unable to handle two portions of the model hierarchy due to their potential to contain thousands of records.
What I've done to resolve this is set up SqlBulkCopy
to handle these two potentially high count models and let EF
handle the rest of the inserts (and referential integrity).
Batch Loop:
foreach (var unitDetails in BatchUnits)
{
var unitOfWork = new Unit(unitDetails);
Task.Factory.StartNew(() =>
{
unitOfWork.ProcessX(); // data preparation
unitOfWork.ProcessY(); // data preparation
unitOfWork.PersistCase();
});
}
Unit:
class Unit
{
public PersistCase()
{
using (var dbContext = new CustomDbContext())
{
// Need an explicit transaction so that
// EF + SqlBulkCopy act as a single block
using (var scope = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions() {
IsolationLevel = System.Transaction.IsolationLevel.ReadCommitted
}))
{
// Let EF Insert most of the records
// Note Insert is all it is doing, no update or delete
dbContext.Units.Add(thisUnit);
dbContext.SaveChanges(); // deadlocks, DbConcurrencyExceptions here
// Copy Auto Inc Generated Id (set by EF) to DataTables
// for referential integrity of SqlBulkCopy inserts
CopyGeneratedId(thisUnit.AutoIncrementedId, dataTables);
// Execute SqlBulkCopy for potentially numerous model #1
SqlBulkCopy bulkCopy1 = new SqlBulkCopy(...);
...
bulkCopy1.WriteToServer(dataTables["#1"]);
// Execute SqlBulkCopy for potentially number model #2
SqlBulkCopy bulkCopy2 = new SqlBulkCopy(...);
...
bulkCopy2.WriteToServer(dataTables["#2"]);
// Commit transaction
scope.Complete();
}
}
}
}
Right now I'm essentially stuck between a rock and a hard place. If I leave the IsolationLevel
set to ReadCommitted
, I get deadlocks between EF
INSERT
statements in different Tasks
.
If I set the IsolationLevel
to ReadUncommitted
(which I thought would be fine since I'm not doing any SELECTs
) I get DbConcurrencyExceptions
.
I've been unable to find any good information about DbConcurrencyExceptions
and Entity Framework
but I'm guessing that ReadUncommitted
is essentially causing EF
to receive invalid "rows inserted" information.
UPDATE
Here is some background information on what is actually causing my deadlocking issues while doing INSERTS:
Apparently this same issue was present a few years ago when Linq To SQL came out and Microsoft fixed it by changing how scope_identity() gets selected. Not sure why their position has changed to this being a SQL Server problem when the same issue came up with Entity Framework.