3

I have a persistent B+tree, multiple threads are reading different chunks of the tree and performing some operations on read data. Interesting part: each thread produces a set of results, and as end user I want to see all the results in one place. What I do: one ConcurentDictionary and all threads are writing to it.

Everything works smooth this way. But the application is time critical, one extra second means a total dissatisfaction. ConcurentDictionary because of the thread-safety overhead is intrinsically slow compared to Dictionary.

I can use Dictionary, then each thread will write results to distinct dictionaries. But then I'll have the problem of merging different dictionaries.

.

My Questions:

  1. Are concurrent collections a good decision for my scenario ?
  2. If Not(1), then how would I merge optimally different dictionaries. Given that, (a) copying items one-by-one and (b) LINQ are known solutions and are not as optimal as expected :)
  3. If Not(2) ;-) What would you suggest instead ?

.

A quick info:

  • #Thread = processorCount. The application can run on a standard laptop (i.e., 4 threads) or high-end server (i.e., <32 threads)
  • Item Count. The tree usually holds more than 1.0E+12 items.
Dr. Strangelove
  • 2,725
  • 3
  • 34
  • 61
  • How many different threads are we talking about and how many entries are you typically adding to the ConcurrentDictionary at the moment? – Phil Wright Feb 16 '15 at 23:33
  • How long is it typically taking at the moment and how fast does it need to be? – Phil Wright Feb 16 '15 at 23:35
  • @PhilWright it's a nice observations, however since it could be the concern at fist sight, I updated the question. – Dr. Strangelove Feb 16 '15 at 23:38
  • @PhilWright, right now it takes typically 4 seconds, but it must be below 500msec. Without the concurrent dictionary (i.e., without saving results :D) it goes even below 300msec. – Dr. Strangelove Feb 16 '15 at 23:39
  • What do you do with the resulting Dictionary (you might be able to delay some of the processing to the later stage of retrieving data, if you do not need too many results)? – xpa1492 Feb 17 '15 at 02:46

2 Answers2

2

From your timings it seems that the locking/building of the result dictionary is taking 3700ms per thread with the actual processing logic taking just 300ms.

I suggest that as an experiment you let each thread create its own local dictionary of results. Then you can see how much time is spent building the dictionary compared to how much is the effect of locking across threads.

If building the local dictionary adds more than 300ms then it will not be possible to meet your time limit. Because without any locking or any attempt to merge the results it has already taken too long.

Update

It seems that you can either pay the merge price as you go along, with the locking causing the threads to sit idle for a significant percentage of time, or pay the price in a post-processing merge. But the core problem is that the locking means you are not fully utilising the available CPU.

The only real solution to getting maximum performance from your cores is it use a non-blocking dictionary implementation that is also thread safe. I could not find a .NET implementation but did find a research paper detailing an algorithm that would indicate it is possible.

Implementing such an algorithm correctly is not trivial but would be fun!

Scalable and Lock-Free Concurrent Dictionaries

Phil Wright
  • 22,580
  • 14
  • 83
  • 137
  • It's a good observation, I remember I did this before moving to concurrent collections. I don't remember the overhead of single dictionary creation, but I do remember very expensive merge. BTW, the output entries totally depends on the operation, it could be 10, it could be billions. – Dr. Strangelove Feb 17 '15 at 00:06
  • Thanks indeed, the paper looks fine and lot of pseudo-code ;-) I try to contact them to see if there is any publicly available implementation, otherwise, as you said, it would be fun implementing it :) – Dr. Strangelove Feb 17 '15 at 07:53
  • 1
    While a lock free dictionary is great and all (if you actually have contention the science is small or wise though), doing the post processing or just creating a master datastructure that offers a simple interface to querying n underlying dictionaries (more lookup time, but no copying) should be much faster in this situation. – Voo Feb 17 '15 at 17:00
  • That said the state of the art concurrent lock free dictionary is Cliff's non blocking hashmap for java (there's also a paper and java one talk around). It's a state machine based solution that allows you to actually prove its correctness. Should be relatively trivial to port to c# (I can see at least one optimization if you go native though). I'd be surprised if nobody had done so. – Voo Feb 17 '15 at 17:04
0

Had you considered async persistence?

Is it allowed in your scenario?

You can bypass to a queue in a separated thread pool (creating a thread pool would avoid the overhead of creating a (sub)thread for each request), and there you can handle the merging logic without affecting response time.

  • I'm afraid an async storage is not allowed, since the current is the outcome of considerable amount of team work/agreement for various reasons. The data structure provides **built-in** threading features, and any custom operations (e.g., the one question regards) are passed by *strategy pattern*. Therefore, thread pool is not a possible option. – Dr. Strangelove Feb 16 '15 at 23:52