0

Software written in C++, must use only Standard C++ library.

Hi, the problem i'm facing is the following: I have to parallelize a software but the multithreaded version completion time is too much randomly, i mean that 50% of the times is faster than the single version and 50% is slower, that's due to a wrong design choice i think, and i would like you to show me how can i correct it.

The software is based on a Tree structure, not binary, that keeps growing and each new node could be a possible solution. Once the software find a solution, the program stops. Now the problem is that in the sequential version, the path that the software follow to compute the nodes is always the same of course, so it need always a fixed time to complete is task. While in the multithreaded version, i have a taskpool where i insert the nodes, and the threads keep fetching the job from the taskpool and pushing back the new nodes, but the order of computation of course is not deterministic, so it happens that sometimes this way of working lead to a greater number of computations that the multithreaded version does, and so it lead to a greater completion time.

So, imagine to have a Tree structure that keeps growing, the root is given and you put all the nodes in a queue, and then you start computing the first node, if it's a solution you terminate otherwise you computer that node and pushback to the queue all the resulting sub-nodes. There are multiple solutions and you don't know in which node they are so you just must to compute each of them until discover a solution. The sequential version will follow always the same path, and so will always need to compute N nodes before reach the first solution, while the multithreaded version can be unlucky, and take differents paths that has no solutions and for that reaching the first solution in more steps.

How could you ensure that the multithreaded version will always do a maximum of N steps before reaching the first solution? Otherwhise the advantage of having multiple threads will be useless if you need to compute a lot more steps.

If needed i will post the code but it's just, as i said, a tree structure and a queue task pool and so on.

  • There are many things that could go wrong when writing the kind of program OP is tackling. However, I believe that there is a single and clear question: "The sequential version will follow always the same path, and so will always need to compute N nodes before reach the first solution, while the multithreaded version can be unlucky, and take differents paths that has no solutions and for that reaching the first solution in more steps. How could you ensure that the multithreaded version will always do a maximum of N steps before reaching the first solution?". Why was the question closed? – Patrick Aug 26 '20 at 00:29

1 Answers1

1

To rephrase your question: Can I guarantee that my parallel program will take no more time than the sequential computation time?

Short answer: "it can be difficult and requires specific details about your program and problem."

Longer answer:

I think your analysis is quite good:

So, imagine to have a Tree structure that keeps growing, the root is given and you put all the nodes in a queue, and then you start computing the first node, if it's a solution you terminate otherwise you computer that node and pushback to the queue all the resulting sub-nodes. There are multiple solutions and you don't know in which node they are so you just must to compute each of them until discover a solution. The sequential version will follow always the same path, and so will always need to compute N nodes before reach the first solution, while the multithreaded version can be unlucky, and take differents paths that has no solutions and for that reaching the first solution in more steps.

Some side-notes: What did you use to implement your queue? This choice is quite critical. If your multiple threads are not able to concurrently put/remove nodes from this queue, you might have a performance bottleneck there. Also, you may not need to put every single node of the exploration tree in that queue. If there are enough nodes to keep every thread busy, you don't need to put more. A typical approach is to choose a depth cut-off beyond which the tree nodes are not placed in the queue but processed by the thread that discovered/generated them sequentially. Whether such a cut-off could be beneficial and the depth at which it should be set depends on your problem.

Provided the matter above is not a source of performance problem, to guarantee that one thread of your parallel version finds a solution in the same time as your sequential version, at least one thread needs to take the same path as the sequential thread would.

How to modify your program to implement this may or may not be tricky. If you are able to sort the nodes according to the traversal order of your sequential program, you should be able to sort the nodes in the queue so that the ones your sequential program would have explored first are taken by your multiple threads first. This will guarantee that at least one thread is taking the path your sequential program would have.

Giving nodes a relative priority in your queue (whatever the meaning behind this ordering is) is akin to implementing a heuristic.

[edit] First, we need to distinguish between CPU time and Elapsed time. If you have a program which runs two threads for 1 minute, that's 1 minute of elapsed time and 2 minutes of CPU time. In the kind of problem you are tackling, you are trying to reduce the total "elapsed" time. It is impossible to consistently reduce the "CPU time" as well. That is because all the computation done by all the threads until one of them finds a solution is "wasted" as it does not participate in the end-result. Using multiple threads will almost always mean doing more computation (more CPU time). If your parallel implementation gives you significantly longer elapsed times, then I suspect you have a severe bottleneck somewhere (your shared queue perhaps?).

Second, when I talk about path, I am talking about a thread's entire path, including the paths that did not lead to a solution. If you want to reduce these "bad paths", what you want is a heuristic that will make threads explore the paths that seem more likely to give a solution first. If you are able to come up with a good heuristic for your specific problem, it will benefit both your sequential and parallel computations.

Patrick
  • 1,458
  • 13
  • 27
  • "What did you use to implement your queue? " i used a std::list of Nodes, you get a node using a lock_guard, then you compute the node, and you push back to the list all the resulting sub-nodes, again using a lock_guard to make the list thread-safe. – GiovanniSan Aug 22 '20 at 13:14
  • "at least one thread needs to take the same path as the sequential thread would." It happens, but the problem is that that path could be precedeed by a lot of wrong paths, so the multihreaded version can take the "correct path"(there are more solutions, so there are more than one correct path) after a lot of wrong paths, resulting in more computations than the sequential version and so a greater completion time. – GiovanniSan Aug 22 '20 at 13:17