0

I'm trying to find the fastest and most performant way to select a subset of items from a list based on a key property and assign this subset (list) to the property of an item in another list. The performance side of this is important since this part of code is going to be invoked very often each workday. I measured the performance in ticks to clearly see the relative difference.

I've got two lists (example setup);

List<CategorySetting> catList;
List<Customer> custList;

The CategorySetting entity has a property called SettingsId. The Customer entity has a SettingsId property as well, which is in fact a foreign key from Customers to CategorySetting.

The first piece of code i wrote was the most straight forward;

// normal for each: 13275 ticks
foreach (var catItem in catList)
{
   catItem.Customers = custList.Where(c => c.SettingsId == catItem.SettingsId).ToList();
}

This would take about 13275 ticks.

I then thought maybe using parallelism this could be a lot faster? So I wrote this piece of code;

// parallel for each: 82541 ticks
Parallel.ForEach(catList, catItem =>
{
   catItem.Customers = custList.Where(c => c.SettingsId == catItem.SettingsId).ToList();
});

This took way longer; 82541 ticks. That made no sense to me because of the parallel nature of this approach. It should use multiple threads to do this so in theory should be much faster. Then I started wondering what would happen if the multiple threads would try to access the customerlist at the same time. That might result in locks and queues hence taking more time because of the overhead? The same as for the writing to the main list.

I tried another approach. I made a ConcurrentBag for the catList (main list).

ConcurrentBag<CategorySettings> csBag = new ConcurrentBag<CategorySettings>(catList);

The custList I punt into a ConcurrentDictionary already grouped by the SettingsId.

var dict = custList.GroupBy(c => c.SettingsId).ToDictionary(x => x.Key, y => y.ToList());
ConcurrentDictionary<int?, List<Customer>> concDict = new ConcurrentDictionary<int?, List<Customer>>(dict);

The final try was then like this:

// paralell, bag, concurrent dictionary: 40255
Parallel.ForEach(caBag, ca =>
{   
   concDict.TryGetValue(ca.SettingsId, out var selCust);
   ca.Customers = selCust;
});

This would take 40255 ticks. Can anyone explain why this is still taking longer? And more important is there no other way then 'just' a foreach loop? Feels like i'm missing something here.

Any ideas are greatly appreciated!

Obelix
  • 708
  • 11
  • 24
  • In my experience Parallel.ForEach is significantly slower than a regular for each unless you're working on hundreds of thousands of items. Even then it tends to be not much better. – DetectivePikachu Nov 20 '19 at 19:51
  • Marking this question, *"Can anyone explain why this [Parallel.ForEach] is still taking longer [than foreach]?"* as duplicate. – Rufus L Nov 20 '19 at 19:58
  • @TheodorZoulias Re-opened. Answer away! – Rufus L Nov 20 '19 at 22:20
  • When I test, I get that `Parallel.ForEach` is twice as fast (.Net Core 3) as `foreach` for about 20,000 ticks for `foreach` (200 Setting Ids, 1846 customers (4 - 10 each)). I think the impact of parallel really depends on your CPU - I have 4 cores hyperthreaded for 8 logical CPUs. – NetMage Nov 21 '19 at 20:18
  • 2
    13,275 ticks is around 1.3 milliseconds - how many times per day are you running this? Even if you dropped it to zero, this is only 1/6500 of a day - you'd have to run this every two seconds to save one minute in the day. Are you sure this isn't _very_ premature optimization? – NetMage Nov 21 '19 at 20:26
  • @NetMage; got me! I'm going to isolate this in a very small example console app and run it on a different machine. I'm running on development server in a private cloud (virtualized). So my CPU is a Xeon 4 core. But these are virtual cores. I'm not sure how these react versus a real quadcore (like the skylake i've got at home). I'm getting back on that. – Obelix Nov 22 '19 at 13:24

2 Answers2

1

You could try using the ToLookup LINQ method:

var customersLookup = custList.ToLookup(item => item.SettingsId);

foreach (var catItem in catList)
{
    catItem.Customers = customersLookup[catItem.SettingsId].ToList();
}

I assumed that the CategorySetting class has a writable property Customers of type IList<Customer>. In case the property is of type IEnumerable<Customer>, you could omit the ToList call.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Note that at 13,275 ticks for the `foreach` loop, the overhead of `ToLookup` make this much slower unless the `custList` doesn't change between `catItem` updates. – NetMage Nov 21 '19 at 20:30
  • Also, using `GroupBy`+`ToDictionary` to setup a `Dictionary>` (e.g. `custList.GroupBy(item => item.SettingsId).ToDictionary(g => g.Key, g => g.ToList());`) is faster than doing `ToList` in the `foreach` loop, assuming you need a `List`. – NetMage Nov 21 '19 at 21:39
0

I ended up using what I had;

foreach (var catItem in catList)
{
    catItem.Customers = custList.Where(c => c.SettingsId == catItem.SettingsId).ToList();
}

@NetMage pointed out very well that this is probably useless optimalisation. I figured that the paralell foreach would be of major use here, but no matter what, the first loop was still the fastest. Both on my quadcore and on my xeon.

Olle Sjögren
  • 5,315
  • 3
  • 31
  • 51
Obelix
  • 708
  • 11
  • 24
  • 1
    This has O(n²) computational complexity, so it should be very fast or very slow, depending on the size of the `custList`. Small list => fast. Big list => slow. – Theodor Zoulias Dec 04 '19 at 03:39
  • The custList is always 355 items (or less). never more.Thank you for your explanation. Reminds me how less I know :) – Obelix Dec 04 '19 at 14:32
  • 100-200 items are usually the tipping point where hashing approaches (`ToLookup`) start to outperform brute-force loops (`Where`). So I guess that for the given max size of the list you are OK with your chosen solution. – Theodor Zoulias Dec 04 '19 at 14:57