I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity>
which inherits Collection<T>
and IEnumerable<T>
In my code below I am filtering using the Where
method and then doing a logic based the count of matches. The Where
seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords
are in memory before this code is executed. There is no IO or DB access in the above loop.