Alpha-Beta "breaking" the Amdahl's law?

Question

~~I have a classic minimax problem solver with additional alpha-beta pruning implementation.~~

I parallelized the algorithm in the following way:

Do iterative deepening until we have more nodes than available threads
Run one minimax per thread in batches of N threads. So if we get 9 possible moves at depth 2 from the serial search, we first start 4 threads, then another 4 and then 1 on the end, each starting at depth 2 with their own parameters.

It turns out that the speedup S=T(serial)/T(parallel) for 4 threads is 4.77 so I am basically breaking Amdahl's law here.

If we say that implementation is not broken in some way, I suspect Alpha-Beta pruning is doing the magic here? Due to starting several searches in parallel, there is more pruning and sooner? That is my theory but I'd love if someone could confirm this in more detail.

Just to clarify:

Minimax without alpha-beta implementation is basically doing depth-first search of the whole tree up to some max depth. With alpha-beta it's doing the same except it prunes some branches which will lead to a worse result anyway.

Edit: After further examination of the code I had a bug on one line of code which caused the program to "cheat" and not follow some moves. Actual speedup factor is 3.6 now. Sorry for wasting everyone's time.. no breakthrough in computing today. :/

One thread can spike the L3 cache and give other cores an easier time to access memory. — Hans Passant, Feb 06 '15 at 14:40

score 1 · Answer 1 · answered Feb 06 '15 at 14:14

1

This can be due to cache effect or similar. It is called superlinear speedup. It can/does happen.

answered Feb 06 '15 at 14:14

wilx

17,697
6
59
114

How would I determine if this is actually going on? How to profile cache hits? – cen Feb 06 '15 at 14:55

score 1 · Answer 2 · answered Feb 06 '15 at 14:17

1

Using more threads you are effectively running a partial breadth-first search. It just happens that your problem is amenable to breadth-first search.

Even on a single-core machine you would see a speedup.

You don't need threads to achieve this speedup. You can simply program a (partial) breadth-first search that behaves like multiple threads would.

Imagine you want to search two lists:

1 million times 0, then 1
1, then 1 million times 0

And you stop as soon as you find 1. If you search them sequentially you need to look at 1,000,002 elements. If you use two threads on a single core the search will immediately find a 1 and you're done. A "superlinear" speedup of 1,000,000x or so!

answered Feb 06 '15 at 14:17

usr

168,620
35
240
369

But does alpha-beta have anything to do with it? If I take that out I am essentially searching the whole tree up to a fixed max depth, not just finding a value like 1. So without a-b, single thread CPU will visit the same amount of nodes no matter what search technique is used and won't be faster. Edited my first post for clarification. – cen Feb 06 '15 at 14:29
But the number of threads does influence where you look first, right? Even very indirectly. – usr Feb 06 '15 at 14:40
Indeed. But let's say you have to search the whole tree which is the case here. In that case partial breadth-first search would not be any faster than depth-first right? – cen Feb 06 '15 at 14:47
I do not follow your algorithm 100%. Are you saying there is no way you are aborting search early depending on what you find? If not, you're right. Then this probably would be due to cache effects like the other answer suggest. But I consider it more likely that cache effects are not the reason because the effect is so strong here. – usr Feb 06 '15 at 14:56
Yes, in basic minimax implementation the whole tree is searched and it won't abort before that. But with alpha-beta addition some branches are pruned. So maybe when using alpha-beta in partial-breadth first search instead of depth-first it prunes higher and faster. I guess that's the core of my question. – cen Feb 06 '15 at 15:04

Alpha-Beta "breaking" the Amdahl's law?

2 Answers2