0

I have a matrix that has to be calculated from the previous values inside the matrix in parallel. It will be nice if anyone of you can give me a Hint of how it can be done. Suppose i have a matrix like

| 4 5 6 7 8|
| 5 5 5 5 5|
| 6 6 6 6 6|
| 9 9 9 9 9|

The value here will be computed as the position (1,1) will be computed from (0,0), (0,1) and (1,0) three neighboring elements. It will be the minimum of its values and so on. Every element is dependent on its previous three neighbors for the computation of its value. Can anyone give me a hint how it can be done in parallelism. thank you.

Rick Viscomi
  • 8,180
  • 4
  • 35
  • 50
Hadi
  • 307
  • 6
  • 20
  • The code I posted [here](http://stackoverflow.com/questions/14920931/3d-cuda-kernel-indexing-for-image-filtering/14926201#14926201) will find the minimum at each location in a 3D matrix, using a 3D (cubic) volume. It should not be too difficult to simplify it to a 2D case. – Robert Crovella May 01 '13 at 03:07
  • 1
    This is the dirty secret of parallel computing. Very many real world algos have internal dependencies that block or hinder parallelism. – David Heffernan May 01 '13 at 05:37

1 Answers1

0

For that kind of dependency you can compute the elements of anti-diagonals in parallel. You have to initialize the topmost row and leftmost column, then proceed, for each anti-diagonal step by step:

0 0 0 0 ..
0 1 2 3 ..
0 2 3 ..
0 3 ...
..

i denoted in the schematic the pass number 0 = init, 1 = first step, 2=second step..

For example, you can compute each cell in step 2 in parallel, then compute each cell in step 3 in parallel and so on like a wave-front sweeping through the matrix (it's a known technique)

Unfortunately since there is data dependency between cells, you need to wait the step to finish before proceed to the next one. Also since the number of elements are variable, some processors will be underutilized by this method.

isti_spl
  • 706
  • 6
  • 10