I keep running into a lack of guidance choosing proper initial capacities for ConcurrentDictionary<TKey, TValue>
.
My general use case is those situations where you really want to do something like the following, but cannot:
public static class StaticCache<T>
{
public static readonly Action CompiledExpression = ...;
}
This generic-based approach avoids a dictionary lookup, but can only be used if we always know the required type at compile time. If we only have a Type
known at runtime, we can no longer use this approach. The next contender is a ConcurrentDictionary<TKey, TValue>
.
The documentation states:
The default capacity (DEFAULT_CAPACITY), which represents the initial number of buckets, is a trade-off between the size of a very small dictionary and the number of resizes when constructing a large dictionary. Also, the capacity should not be divisible by a small prime number. The default capacity is 31.
My number of expected elements tends to be relatively small. Sometimes as small as 3 or 5, sometimes perhaps 15. As such:
- The number of insertions over the lifetime of the application will be extremely minimal, warranting a [write] concurrency level of 1, thus optimizing for compactness and for read operations.
- It is preferable to have the smallest possible memory footprint, to optimize cache behavior.
Since the default initial capacity is 31, we can potentially reduce our impact on the cache (as well as increase the likelihood for our dictionary to remain in the cache) by using a smaller initial capacity.
This raises the following questions:
What does the capacity actually mean?
- (A) That the dictionary does not need to grow to hold up to this many elements?
- (B) A fixed percentage of A, depending on the dictionary's maximum "fullness", e.g. 75%?
- (C) An approximation of A or B, depending on how the actual contents' hash codes distribute them?
What does and does not constitute "a small prime"? Apparently, 31 does not. Does 11? Does 17? Does 23?
If we do happen to want a capacity near a small prime, what capacity can we choose instead? Do we simply choose the nearest non-prime number, or are primes better for capacities and should we really choose a greater prime instead?