0

I have a large import (excel file) in which I'd like to read rows in batches, asynchronously. Using EPPlus 4.x, I've tried the following, but it locks up. Anybody have ideas on how to do this?

In a nutshell, I'm creating batches of 200 ExcelRanges (haven't seen how I can get an actual Row reference) and then asynchronously processing each batch. However, only one row from each batch is getting processed. I've followed this pattern many times before, but not with EPPlus.

{
    //...
    int batchSize = 200;
    int batchCount = (int)Math.Ceiling((double)(rowCount-1) / (double)batchSize); 
    // start at row 2
    int processCount = 2;
    
    List<ExcelRange> batches = new List<ExcelRange>();
    while (processCount < rowCount)
    {
        var lastRow = processCount + batchSize;
        if (lastRow > rowCount)
             lastRow = rowCount; 
                    
        batches.Add(_worksheet.Cells[processCount, 1, lastRow,
            _worksheet.Dimension.Columns]);
        processCount = lastRow + 1;
    }
    
    // process batches asynchronously
    var tasks = batches.Select(batch => ProcessBatch(batch));
    await Task.WhenAll(tasks);
    //...
}

private async Task ProcessBatch(ExcelRange batch)
{ 
    for (var rowIndex = batch.Start.Row; rowIndex <= batch.End.Row; rowIndex++)
    {                
        // get Fund instance from row
        var parsedFund = await GetParsedFund(batch, rowIndex);

        // determine if update/insert
        await ResolveFund(parsedFund);
    }
}
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
razaross444
  • 513
  • 4
  • 15
  • Do you modify the `ExcelRange`s during the parallel processing, or you are just reading them? Most probably the type is not thread-safe. – Theodor Zoulias Jan 15 '21 at 04:21
  • @TheodorZoulias I'm just reading one ExcelRange in each thread. – razaross444 Jan 15 '21 at 18:29
  • Most classes are safe for multithreaded read operations, like the [ADO.NET classes](https://stackoverflow.com/questions/16155180/which-ado-net-dataset-datatable-methods-are-safe-for-multiple-reader-threads) for example. But this rule may have exceptions, and the `ExcelRange` could be one of them. You should look at the documentation, or the source code, or ask the vendor about it. Otherwise you should assume that it's not thread-safe for reading, and do all the reading on one thread. – Theodor Zoulias Jan 15 '21 at 18:38

0 Answers0