5

I'm half-way through implementing parallel depth-first search algorithm in MPI and I'm thinking about trying to also do it in CUDA / OpenCL, just for fun / out of curiosity. The algorithm is simple but not trivial. The single-core version in C is about 200 lines of code.

How much is GPGPU suitable for this kind of problem?

fhucho
  • 34,062
  • 40
  • 136
  • 186

1 Answers1

6

Tree search operations are not so simple to be implemented in CUDA. There are some papers, like the one

And another rather simple implementation (not quite a massively parallelized implementation in my opinion)

  • "Accelerating Large Graph Algorithms on the GPU Using CUDA" Pawan Harish and P. J. Narayanan

The difficulty comes from the fact that, tree operations generally involve decision making and according to the decisions different branches are taken. So massively parallelizing the operations without overlapping and making redundant operations is quite hard.

There are some approaches which use Stack and Queue implementations to traverse Trees.

You may find a similar question in here: Error: BFS on CUDA Synchronization

Community
  • 1
  • 1
phoad
  • 1,801
  • 2
  • 20
  • 31
  • 1
    "The difficulty comes from the fact that, tree operations generally involve decision making and according to the decisions different branches are taken." @phoad - I can't wait for Dynamic Parallelism :) – Mark Ebersole Oct 02 '12 at 00:24
  • There are methods like "speculative execution". Even though it decreases the degree of parallelism, it may be beneficial for tree growing and searching algorithms. – phoad Oct 02 '12 at 06:06