2

I keep running into a lack of guidance choosing proper initial capacities for ConcurrentDictionary<TKey, TValue>.

My general use case is those situations where you really want to do something like the following, but cannot:

public static class StaticCache<T>
{
    public static readonly Action CompiledExpression = ...;
}

This generic-based approach avoids a dictionary lookup, but can only be used if we always know the required type at compile time. If we only have a Type known at runtime, we can no longer use this approach. The next contender is a ConcurrentDictionary<TKey, TValue>.

The documentation states:

The default capacity (DEFAULT_CAPACITY), which represents the initial number of buckets, is a trade-off between the size of a very small dictionary and the number of resizes when constructing a large dictionary. Also, the capacity should not be divisible by a small prime number. The default capacity is 31.

My number of expected elements tends to be relatively small. Sometimes as small as 3 or 5, sometimes perhaps 15. As such:

  • The number of insertions over the lifetime of the application will be extremely minimal, warranting a [write] concurrency level of 1, thus optimizing for compactness and for read operations.
  • It is preferable to have the smallest possible memory footprint, to optimize cache behavior.

Since the default initial capacity is 31, we can potentially reduce our impact on the cache (as well as increase the likelihood for our dictionary to remain in the cache) by using a smaller initial capacity.

This raises the following questions:

  1. What does the capacity actually mean?

    • (A) That the dictionary does not need to grow to hold up to this many elements?
    • (B) A fixed percentage of A, depending on the dictionary's maximum "fullness", e.g. 75%?
    • (C) An approximation of A or B, depending on how the actual contents' hash codes distribute them?
  2. What does and does not constitute "a small prime"? Apparently, 31 does not. Does 11? Does 17? Does 23?

  3. If we do happen to want a capacity near a small prime, what capacity can we choose instead? Do we simply choose the nearest non-prime number, or are primes better for capacities and should we really choose a greater prime instead?

Timo
  • 7,992
  • 4
  • 49
  • 67
  • 2
    (2) When it says `the capacity should not be divisible by a small prime number.` it implies `other than itself`, so 31 is *not* divisible by a small prime number, given that 31 is itself a prime. (3) See my comment on (2)! – Matthew Watson Jan 31 '20 at 14:08
  • These are all great questions where I can't help but thinking that it's even better if someone invested in the whole thing actually crunched benchmarks on behalf of the common good. Hint, hint. :-P – Jeroen Mostert Jan 31 '20 at 14:09
  • 1
    If the number of insertions is minimal, and performance is really the issue, I would recommend using something else than `ConcurrentDictionary<>`, or even `Dictionary<>`. They aren't exactly cache friendly. Having something in lines of immutable sorted array, that is completely (atomically) copied + replaced on insertion is better choice. – nothrow Jan 31 '20 at 14:16
  • I was just about to basically say what @nothrow just said, with the added remark that `ImmutableDictionary` is another option (slightly simpler than maintaining an array yourself if you want to index by type). Even so a linear search through a tiny array can be as fast or even faster than a dictionary lookup, depending on the exact scenario (but using a `Dictionary`-like type anyway to cover the case where you *do* have more than X elements is better than relying on premature optimization if you can't guarantee it.) – Jeroen Mostert Jan 31 '20 at 14:18
  • Did you do any memory profiling to see of setting the capacity (1-15) actually make any noticeable difference? – Magnus Jan 31 '20 at 14:46
  • @MatthewWatson LOL! You may be right. I feel silly. :) Anyway, while this makes sense, can we be certain? – Timo Jan 31 '20 at 21:00

1 Answers1

3

In the reference source for ConcurrentDictionary<TKey, TValue> you can see:

Node[] buckets = new Node[capacity];

So, the capacity is the effective size of the hash table. No "fullness" is considered. The only pre-processing of this number is:

if (capacity < concurrencyLevel)
{
    capacity = concurrencyLevel;
}

where concurrencyLevel is either defined by you through a constructor parameter or is the default concurrency level defined as PlatformHelper.ProcessorCount.

The capacity is treated differently in Dictionary<TKey,TValue>. Here it is initialized with

private void Initialize(int capacity) {
    int size = HashHelpers.GetPrime(capacity);
    buckets = new int[size];
    ...
}

and HashHelpers.GetPrime gets the smallest prime which is greater than or equal to the specified capacity. Primes up to 7199369 are taken from a precalculated array. Lager ones are calculated "the hard way". It is interesting to note that the smallest considered prime is 3.

Unfortunately, HashHelpers is an internal class.

If I understand it right, both implementations resize the hash table based on the number of collisions and not based on a specific fill-factor ("fullness").

If you want to

  • optimize speed: take an initial capacity which is a prime about 30% bigger than the expected maximum dictionary size. This avoids resizing.
  • optimize the memory footprint: take a prime which is about 30% bigger than the minimum expected size.
  • a balance between speed and memory footprint: Take a number in between the two from above. But in any case, take a prime.
Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • Where does the 30% come from, and how relevant is that if the expected dictionary size is very small? – Jeroen Mostert Jan 31 '20 at 15:17
  • The number of collisions increases dramatically when high degree of filling is reached. This is not an exact number but it is a number that has proven itself in practice. It is an approximate number. According to [Hash table (Wikipedia)](https://en.wikipedia.org/wiki/Hash_table#Load_factor) the default load factor for a `HashMap` in Java 10 is 0.75. 1/0.75 = 1.3333 => you get about 33%. But you can safely take a much smaller number if you prefer a small memory footprint over speed. The hash table will be resized automatically when needed. – Olivier Jacot-Descombes Jan 31 '20 at 15:38
  • Very thorough. Thank you. I had always suspected that I could not rely on being able to use 100% of the initial capacity and had been assuming that a fullness factor of or around 75% would be the case. Glad to have confirmation that this was in the ballpark. Out of curiosity, what makes you certain that we should pick a prime in any case? – Timo Jan 31 '20 at 20:55
  • I will point out that taking a prime about 30% bigger than the **minimum** expected size is not necessary a good optimization for memory footprint: anything above the minimum (i.e. generally the majority of cases) are likely to _just barely_ cause a resize, doubling(?) the size and leaving a lot of empty space. – Timo Jan 31 '20 at 20:57
  • 1
    Using primes minimizes clustering in the hashed table. See: [Why is it best to use a prime number as a mod in a hashing function?](https://cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function). But do not over estimate optimization. Recently I calculated primes using [Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) up to 100 millions (i.e. with an array `new bool[100_000_000]`). On a single thread, this took less than one second on my PC! – Olivier Jacot-Descombes Feb 01 '20 at 15:32
  • 2
    Curious that `Dictionary` takes the nearest fitting prime, but `ConcurrentDictionary` does not... especially considering that the documentation does not instruct you to choose a prime. This could be quite the performance trap to the unsuspecting developer. – Timo Feb 05 '20 at 10:41