0

I am creating an Excel file using Open XML SDK. In this process, I have a scenario like below.

I need to add data into a Dictionary<uint, string> if key is not exists. For that I am using below code.

var dataLines = sheetData.Elements<Row>().ToList();

for (int i = 0; i < dataLines.Count; i++)
{
    var x = dataLines[i];
    if (!dataDictionary.TryGetValue(x.RowIndex.Value, out var res)) // 700 seconds, 1,279,999,998 Hit counts
    {
        dataDictionary.Add(x.RowIndex.Value, x.OuterXml);
    }
}

When I am trying to create an Excel sheet which has rows around 90,000 - 92,000, the line with the IF condition in above code consume 700 seconds to complete. (checked with a performance profiler, also this line has 1,279,999,998 Hit counts).

How could I reduce the time the line with the IF condition in above code consumes?

Is there any better way to achive this with less time?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Bishan
  • 15,211
  • 52
  • 164
  • 258
  • 1
    You are processing 1.2 billion spreadsheet rows? I think 0.0000005s per call is already pretty good. I don't think `DocumentFormat.OpenXml.Spreadsheet.Row` has a `.Key` property, what key's does your dictionary have? Does the key have a good `.GetHashCode()` implementation? – Jeremy Lakeman Apr 27 '22 at 04:13
  • 1
    Jeremy got in before me. I was also going to say that the issue is not the time taken per call but the number of calls. Processing that much data and expecting not to take some time is rather unreasonable. How long do you think it should take to check a `Dictionary` that many times? – John Apr 27 '22 at 04:20
  • 1
    FYI The default hashcode of an int / uint is just the value itself. Which might perform poorly depending on the types of values you are adding to the dictionary. You might end up with all keys mapping to the same internal hash bucket. – Jeremy Lakeman Apr 27 '22 at 04:22
  • @JeremyLakeman I am processing rows around 90,000 - 92,000. It's `.RowIndex.Value`. Updated the question. Dictionary have `uint` key. I didn't manually implemented `.GetHashCode()` – Bishan Apr 27 '22 at 04:22
  • 4
    Are you sure you don't have some weird loop around this code, causing you to process the same 90k rows ~14k times on average? – Jeremy Lakeman Apr 27 '22 at 04:25
  • Yeah, I think I need to check why there is 1,279,999,998 Hit counts. – Bishan Apr 27 '22 at 04:26
  • Are you targeting the .NET 6 or an order platform? – Theodor Zoulias Apr 27 '22 at 04:49
  • @TheodorZoulias .NET Framework 4.5.2 – Bishan Apr 27 '22 at 05:37
  • 1
    *I didn't manually implemented .GetHashCode()* - there's no point with a uint key – Caius Jard Apr 27 '22 at 05:43
  • 2
    If you take the `x.RowIndex.Value` out to a separate line/stored variable, does the `if` remain the slow line, or is `x.RowIndex.Value` actually a really slow operation that you do twice? – Caius Jard Apr 27 '22 at 05:47
  • @CaiusJard Good point. I took the `x.RowIndex.Value` out to a separate variable and it is the slow operation. Not the `if`. – Bishan Apr 27 '22 at 06:07
  • 1
    Perhaps you can load all your data out of the sheet in some fast way (I like Mark Pflug's Sylvan library) and into some more rapid container/an array etc – Caius Jard Apr 27 '22 at 08:14

1 Answers1

1

If the if statement is slow, one option you have is to eliminate it entirely and use the indexer of the dictionary to set the value. This means that the "last match will win". If you want the "first match to win", all you have to do is reverse the order you are iterating the list.

var dataLines = sheetData.Elements<Row>().ToList();

for (int i = dataLines.Count - 1; i >= 0; i--)
{
    var x = dataLines[i];
    dataDictionary[x.RowIndex.Value] = x.OuterXml;
}
  • If x.RowIndex.Value is unique, it doesn't matter which direction you iterate.
  • If it is important that the key is sorted in ascending order, you can use a SortedDictionary<TKey, TValue>.

But as others have pointed out, it seems odd that you have so many hit counts. There is probably recursion going on in your application that you need to track down.

NightOwl888
  • 55,572
  • 24
  • 139
  • 212
  • Thank you. Currently checking recursion in my application. Also, I'll try with your suggestions too. – Bishan Apr 27 '22 at 05:41