Parallel finite-difference calculation in OpenMDAO executes each point in each process

Question

I am trying to set up a problem in OpenMDAO and would like to make use of parallel finite difference computations. However, when I call compute_totals() each MPI process actually computes all the perturbed points.

I have made a minimal example that demonstrates the problem. Consider the simple case of a model which can be represented by a matrix multiplication. The Jacobian of this model is simply the matrix of the model. See the code below:

import numpy as np
import time

from openmdao.api import ExplicitComponent, Problem, IndepVarComp, Group
from openmdao.utils.mpi import MPI

rank = 0 if not MPI else MPI.COMM_WORLD.rank

class MatMultComp(ExplicitComponent):
    def __init__(self, matrix, **kwargs):
        super().__init__(**kwargs)
        self.matrix = matrix

    def setup(self):
        self.add_input('x', val=np.ones(self.matrix.shape[1])))
        self.add_output('y', val=np.ones(self.matrix.shape[0])))

    def compute(self, inputs, outputs, **kwargs):
        outputs['y'] = self.matrix.dot(inputs['x'])
        print('{} :: x = {}'.format(rank, np.array_str(inputs['x'])))


class Model(Group):
    def setup(self):
        matrix = np.arange(25, dtype=float).reshape(5, 5)
        self.add_subsystem('ivc', IndepVarComp('x', np.ones(matrix.shape[1])), promotes=['*'])
        self.add_subsystem('mat', MatMultComp(matrix), promotes=['*'])
        self.approx_totals(step=0.1)
        self.num_par_fd = matrix.shape[1]


if __name__ == '__main__':
    p = Problem()
    p.model = Model()
    p.setup()
    p.run_model()

    t0 = time.time()
    jac = p.compute_totals(of=['y'], wrt=['x'], return_format='array')
    dt = time.time() - t0

    if rank == 0:
        print('Took {:2.3f} seconds.'.format(dt))
        print('J = ')
        print(np.array_str(jac, precision=0))

When I run this code without MPI, I get the following output:

0 :: x = [1. 1. 1. 1. 1.]
0 :: x = [1.1 1.  1.  1.  1. ]
0 :: x = [1.  1.1 1.  1.  1. ]
0 :: x = [1.  1.  1.1 1.  1. ]
0 :: x = [1.  1.  1.  1.1 1. ]
0 :: x = [1.  1.  1.  1.  1.1]
Took 5.008 seconds.
J = 
[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]

This is the correct result, and takes about 5 seconds, as expected. Now, when I run this with MPI, using 5 processes, with the command mpirun -np 5 python matmult.py, I get the following output:

0 :: x = [1. 1. 1. 1. 1.]
1 :: x = [1. 1. 1. 1. 1.]
2 :: x = [1. 1. 1. 1. 1.]
3 :: x = [1. 1. 1. 1. 1.]
4 :: x = [1. 1. 1. 1. 1.]
0 :: x = [1.001 1.    1.    1.    1.   ]
1 :: x = [1.001 1.    1.    1.    1.   ]
2 :: x = [1.001 1.    1.    1.    1.   ]
3 :: x = [1.001 1.    1.    1.    1.   ]
4 :: x = [1.001 1.    1.    1.    1.   ]
3 :: x = [1.    1.001 1.    1.    1.   ]
0 :: x = [1.    1.001 1.    1.    1.   ]
1 :: x = [1.    1.001 1.    1.    1.   ]
2 :: x = [1.    1.001 1.    1.    1.   ]
4 :: x = [1.    1.001 1.    1.    1.   ]
2 :: x = [1.    1.    1.001 1.    1.   ]
3 :: x = [1.    1.    1.001 1.    1.   ]
0 :: x = [1.    1.    1.001 1.    1.   ]
1 :: x = [1.    1.    1.001 1.    1.   ]
4 :: x = [1.    1.    1.001 1.    1.   ]
1 :: x = [1.    1.    1.    1.001 1.   ]
2 :: x = [1.    1.    1.    1.001 1.   ]
3 :: x = [1.    1.    1.    1.001 1.   ]
0 :: x = [1.    1.    1.    1.001 1.   ]
4 :: x = [1.    1.    1.    1.001 1.   ]
0 :: x = [1.    1.    1.    1.    1.001]
1 :: x = [1.    1.    1.    1.    1.001]
2 :: x = [1.    1.    1.    1.    1.001]
3 :: x = [1.    1.    1.    1.    1.001]
4 :: x = [1.    1.    1.    1.    1.001]
Took 5.072 seconds.
J = 
[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]

The final result is correct, of course. However, this defies the purpose of using MPI, because each of the 5 processes computed all the perturbed points, and the total execution takes about 5 seconds like before. I expected the following output:

0 :: x = [1. 1. 1. 1. 1.]
1 :: x = [1. 1. 1. 1. 1.]
2 :: x = [1. 1. 1. 1. 1.]
3 :: x = [1. 1. 1. 1. 1.]
4 :: x = [1. 1. 1. 1. 1.]
0 :: x = [1.1 1.  1.  1.  1. ]
1 :: x = [1.  1.1 1.  1.  1. ]
2 :: x = [1.  1.  1.1 1.  1. ]
3 :: x = [1.  1.  1.  1.1 1. ]
4 :: x = [1.  1.  1.  1.  1.1]
Took 1.000 seconds.
J = 
[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]]

Note that in reality the order in which the processes finish is arbitrary, and the time it took will be a little more than 1 second.

How can I get this to work as expected? Note that I am using OpenMDAO 2.5.0.

Bret Naylor · Accepted Answer · 2019-02-01T21:40:38.527

There are a few of issues here. The first is that num_par_fd should typically be passed as an __init__ arg to your Group or your Component. Setting it in the Component or Group's setup() function is too late, because OpenMDAO does all of its MPI communicator splitting in the _setup_procs function, which happens before the Component/Group setup call. The same timing issue applies to calling the approx_totals function. It must be called prior to the Problem setup call. Finally, the name of the attribute we use internally to specify the number of parallel FD computations is actually self._num_par_fd and not self.num_par_fd. Setting of the internal _num_par_fd attribute isn't recommended, but if you must, you'll have to set it before Problem setup is called.

Note: this is a heavily edited version of my original answer.

This makes sense. However, I tried changing `self.num_par_fd = matrix.shape[1]` to `self._num_par_fd = matrix.shape[1]`, but I got exactly the same output. Furthermore, when I instead changed the code so I do `p.model = Model(num_par_fd=5)`, I get `RuntimeError: '': num_par_fd = 5 but FD is not active.`. — D. de Vries, Feb 01 '19 at 20:32

Parallel finite-difference calculation in OpenMDAO executes each point in each process

1 Answers1