concurrent application not as fast as a singlethreaded

Question

I've implemented a pipeline approach. I'm going to traverse a tree and I need certain values which aren't available beforehand... so I have to traverse the tree in parallel (or before) and once more for every node I want to save values (descendantCount for example).

As such I'm interating through the tree, then from the constructor I'm calling a method which invokes a new Thread started through an ExecutorService. The Callable which is submitted is:

    @Override
    public Void call() throws Exception {
        // Get descendants for every node and save it to a list.
        final ExecutorService executor =
            Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        int index = 0;
        final Map<Integer, Diff> diffs = mDiffDatabase.getMap();
        final int depth = diffs.get(0).getDepth().getNewDepth();
        try {
            boolean first = true;
            for (final AbsAxis axis = new DescendantAxis(mNewRtx, true); index < diffs.size()
                && ((diffs.get(index).getDiff() == EDiff.DELETED && depth < diffs.get(index).getDepth()
                    .getOldDepth()) || axis.hasNext());) {
                if (axis.getTransaction().getNode().getKind() == ENodes.ROOT_KIND) {
                    axis.next();
                } else {
                    if (index < diffs.size() && diffs.get(index).getDiff() != EDiff.DELETED) {
                        axis.next();
                    }

                    final Future<Integer> submittedDescendants =
                        executor.submit(new Descendants(mNewRtx.getRevisionNumber(), mOldRtx
                            .getRevisionNumber(), axis.getTransaction().getNode().getNodeKey(), mDb
                            .getSession(), index, diffs));
                    final Future<Modification> submittedModifications =
                        executor.submit(new Modifications(mNewRtx.getRevisionNumber(), mOldRtx
                            .getRevisionNumber(), axis.getTransaction().getNode().getNodeKey(), mDb
                            .getSession(), index, diffs));
                    if (first) {
                        first = false;
                        mMaxDescendantCount = submittedDescendants.get();
                        // submittedModifications.get();
                    }
                    mDescendantsQueue.put(submittedDescendants);
                    mModificationQueue.put(submittedModifications);
                    index++;
                }
            }

            mNewRtx.close();
        } catch (final AbsTTException e) {
            LOGWRAPPER.error(e.getMessage(), e);
        }
        executor.shutdown();
        return null;
    }

Therefore for every node it's creating a new Callable which traverses the tree for every node and counts descendants and modifications (I'm actually fusing two tree-revisions together). Well, mDescendantsQueue and mModificationQueue are BlockingQueues. At first I've only had the descendantsQueue and traversed the tree once more to get modifications of every node (counting modifications made in the subtree of the current node). Then I thought why not do both in parallel and implement a pipelined approach. Sadly the performance seemed to have decreased everytime I've implemented another multithreaded "step".

Maybe because an XML-tree usually isn't that deep and the Concurrency-Overhead is too heavy :-/

At first I did everything sequential, which was the fastest: - traversing the tree - for every node traverse the descendants and compute descendantCount and modificationCount

After using a pipelined approach with BlockingQueues it seems the performance has decreased, but I haven't actually made any time measures and I would have to revert many changes to go back :( Maybe the performance increases with more CPUs, because I only have a Core2Duo for testing right now.

best regards,
Johannes

What is mNewRtx to start with? If it's something which either doesn't support concurrency or uses synchronization to handle it, that would certainly hurt. Can you tell whether your two cores are actually being used all the time? — Jon Skeet, Sep 09 '11 at 13:10
It's a read transaction which can iterate through the tree. Some methods are synchronized, but I don't use them. I think 2 Cores have to be utilized because the code fragment I posted is executed in another thread (and it spawns more threads for every node, but maximum are 2 threads, because of my notebook). I think to spawn so many threads might be the actual problem. But nontheless in the future it maybe would pay off and scale with more cores. That's why I've done it in the first place. — Johannes, Sep 09 '11 at 13:22
This *isn't* spawning many threads, because you're using a fixed thread pool. I'm sure it's using both cores - but you should check how *fully* it's using them. — Jon Skeet, Sep 09 '11 at 13:28
Yes they are, at about 85 to 98% each while running the computation from Eclipse. — Johannes, Sep 09 '11 at 13:39

score 1 · Answer 1 · answered Sep 09 '11 at 14:49

Probably this should help: Amadahl's law, what it basically says it that the increase in productivity depends (inversely proportional) to the percentage of the code which has to be processed by synchronization. Hence even by increasing by increasing more computing resources, it wont end up to the better result. Ideally if the ratio of ( the synchronized part to the total part) is low, then with (number of processors +1) should give the best output (unless you are using network or other I/O in which case you can increase the size of the pool). So just follow it up from the above link and see if it helps

score 0 · Answer 2 · answered Sep 09 '11 at 13:10

0

From your description it sounds like you're recursively creating threads, each of which processes one node and then spawns a new thread? Is this correct? If so, I'm not surprised that you're suffering from performance degradation.

A simple recursive descent method might actually be the best way to do this. I can't see how multithreading will gain you any advantages here.

answered Sep 09 '11 at 13:10

mcfinnigan

11,442
35
28

Just thought about it the same way, maybe I should get rid of the inner `Future`-Creation and just call the two inner classes without an `ExecutorService` and save the results in the `BlockingQueue`s, what do you think? Meaning I just want to do the second traversal in parallel to the first one, which needs certain node values which have to be computed on the fly. – Johannes Sep 09 '11 at 13:15
I think that if you're after speed you need to abandon the idea of threading this. Basically, synchronizing between your thread that is waiting for data and your thread that is calculating is going to add processing time, no matter what you do. I would suggest you perform a tail-recursive method to calculate the values you need, followed by the processing traversal you wish to do on the tree. It will likely be the simplest and quickest way to get the results you need. – mcfinnigan Sep 09 '11 at 13:18
Addendum to my statement above - you might start to see processing speed gains if you break the tree into subtrees. For e.g. process half the tree in one thread, the other half in the second thread, then join their results prior to walking. That sort of approach would scale better with additional cores in my opinion. – mcfinnigan Sep 09 '11 at 13:25
2

Hm would be a great task working with the new ForkJoin-Framework. The thing is I really need the results in order (in Preorder/Depth first), but I think it should be possible with the ForkJoin-Framework. Haven't done anything with it so far... – Johannes Sep 09 '11 at 13:42

concurrent application not as fast as a singlethreaded

2 Answers2