How to use pyCUDA to broadcast via MPI?

Question

Is there anyone, who ever sent CUDA arrays over MPI via most recent mpy4py ( and pyCUDA 2015.1.3)? To send an array, one must convert respective data type to the contiguous buffer. This conversion is done using the following lambda:

    to_buffer = lambda arr: None if arr is None else lambda arr: arr.gpudata.as_buffer(arr.nbytes

Complete script look as follows:

    import numpy
    from mpi4py import MPI

    import pycuda.gpuarray as gpuarray
    import pycuda.driver as cuda
    import pycuda.autoinit
    import numpy

    to_buffer = lambda arr: None if arr is None else lambda arr: arr.gpudata.as_buffer(arr.nbytes)

    print "pyCUDA version " + str(pycuda.VERSION )
    a_gpu = gpuarray.to_gpu(numpy.random.randn(4,4).astype(numpy.float32))

    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()

    comm.Bcast([ to_buffer(agpu , MPI.FLOAT], root=0)

But unfortunately, all this beauty crashes with these errors:

    pyCUDA version (2015, 1, 3)
    Traceback (most recent call last):
    File "./test_mpi.py", line 21, in <module>
    comm.Bcast([ to_buffer( numpy.random.randn(4,4).astype(numpy.float32)) , MPI.FLOAT], root=0)
    File "Comm.pyx", line 405, in mpi4py.MPI.Comm.Bcast (src/mpi4py.MPI.c:66743)
    File "message.pxi", line 388, in mpi4py.MPI._p_msg_cco.for_bcast (src/mpi4py.MPI.c:23220)
    File "message.pxi", line 355, in mpi4py.MPI._p_msg_cco.for_cco_send (src/mpi4py.MPI.c:22959)
    File "message.pxi", line 111, in mpi4py.MPI.message_simple (src/mpi4py.MPI.c:20516)
    File "message.pxi", line 51, in mpi4py.MPI.message_basic (src/mpi4py.MPI.c:19644)
    File "asbuffer.pxi", line 108, in mpi4py.MPI.getbuffer (src/mpi4py.MPI.c:6757)
    File "asbuffer.pxi", line 50, in mpi4py.MPI.PyObject_GetBufferEx (src/mpi4py.MPI.c:6093)
    TypeError: expected a readable buffer object

Any ideas what's going on? Maybe someone have alternative buffer conversion mantra?

Thanks in advance!

plain mpi requires objects which support the buffer protocol in *host* memory. A `DeviceAllocation` is in device memory. I don't think that could ever work — talonmies, Sep 07 '15 at 17:33
@talonmies, maybe you right, but I think, that a_gpu is indeed located at GPU but to_buffer will copy it on host. If this is wrong, please explain in more details. Thank you! — Vast Academician, Sep 08 '15 at 07:19
You can read the `as_buffer` documentation [here](http://documen.tician.de/pycuda/driver.html#pycuda.driver.DeviceAllocation) and its source [here](https://github.com/inducer/pycuda/blob/fde69b0502d944a2d41e1f1b2d0b78352815d487/src/cpp/cuda.hpp#L1547). I don't see anywhere where a device to host copy would be initiated by creating a buffer object from a DeviceAllocation. Do you? — talonmies, Sep 08 '15 at 09:06

score 1 · Answer 1 · edited Sep 08 '15 at 12:29

1

All that is needed is to call the MPI broadcast with a valid host memory buffer object or numpy array, for example:

comm.Bcast( a_gpu.get(), root=0)

in place of the lambda for transforming the DeviceAllocation object to a buffer object

edited Sep 08 '15 at 12:29

talonmies

70,661
34
192
269

answered Sep 08 '15 at 12:07

Vast Academician

357
4
12

How to use pyCUDA to broadcast via MPI?

1 Answers1