I have a large import (excel file) in which I'd like to read rows in batches, asynchronously. Using EPPlus 4.x, I've tried the following, but it locks up. Anybody have ideas on how to do this?
In a nutshell, I'm creating batches of 200 ExcelRange
s (haven't seen how I can get an actual Row
reference) and then asynchronously processing each batch. However, only one row from each batch is getting processed. I've followed this pattern many times before, but not with EPPlus.
{
//...
int batchSize = 200;
int batchCount = (int)Math.Ceiling((double)(rowCount-1) / (double)batchSize);
// start at row 2
int processCount = 2;
List<ExcelRange> batches = new List<ExcelRange>();
while (processCount < rowCount)
{
var lastRow = processCount + batchSize;
if (lastRow > rowCount)
lastRow = rowCount;
batches.Add(_worksheet.Cells[processCount, 1, lastRow,
_worksheet.Dimension.Columns]);
processCount = lastRow + 1;
}
// process batches asynchronously
var tasks = batches.Select(batch => ProcessBatch(batch));
await Task.WhenAll(tasks);
//...
}
private async Task ProcessBatch(ExcelRange batch)
{
for (var rowIndex = batch.Start.Row; rowIndex <= batch.End.Row; rowIndex++)
{
// get Fund instance from row
var parsedFund = await GetParsedFund(batch, rowIndex);
// determine if update/insert
await ResolveFund(parsedFund);
}
}