3

I am using Ubuntu for development of Fortran (2008+) programs with MPI. Things were pretty settled on earlier Ubuntu versions, but I am experiencing some difficulties to compile and run Fortran/MPI on Ubuntu 22.04, which I installed on a new PC very recently.

I first installed OpenMPI, but it wouldn't compile my programs at all, complaining that it can't find some include files related to mpi_f08. (I am sorry, but I can't recall the exact message and I uninstalled the OpenMPI since).

I had better luck with MPICH though. It can compile my programs, but crashes during execution as soon as the first communication between processors should take place. A minimum example which demonstrates the issue is given below:

subroutine global_sum_real(phi_old)
use mpi_f08
implicit none
real    :: phi_old
real    :: phi_new
integer :: error
call mpi_allreduce(phi_old,         & ! send buffer
                   phi_new,         & ! recv buffer
                   1,               & ! length
                   mpi_real,        & ! datatype
                   mpi_sum,         & ! operation
                   mpi_comm_world,  & ! communicator
                   error)
phi_old = phi_new
end subroutine

program global_sum_mpi
use mpi_f08
implicit none
real    :: suml
integer :: error
call mpi_init(error)
suml = 1.0
call global_sum_real(suml)
print *, suml
call mpi_finalize(error)
end program

I hope it is clear what is happening above. The main program (global_sum_mpi) initializes MPI and calls one subroutine (global_sum_real) which is essentially an interface to MPI_Allreduce. Very simple.

If I compile it with mpifort (it is an: mpifort for MPICH version 4.0 ... gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)) and try to run it in parallel, it crashes with the error:

Internal Error: Invalid type in descriptor

in the line which calls MPI_Allreduce. The funny thing is that if I change the modules I used for MPI from:

use mpi_f08

to the plain:

use mpi

Everything works as expected. This is not a route I would like to take because I believe that mpi_f08 is more up to date with later Fortran standards and I also need the mpi_f08 for better compatibility with external PETSc libraries.

Any ideas on why the use mpi_f08 is causing problems on the new Ubuntu installation?

Kind regards

Bojan Niceno
  • 113
  • 1
  • 1
  • 11
  • 1
    I can reproduce the issue, probably something fishy between the compiler and Fortran 2008 TS extensions. As a workaround, you can use an array of one `real` instead of a scalar as the send buffer for `MPI_Allreduce()`. – Gilles Gouaillardet Dec 09 '22 at 05:32
  • 1
    a simpler workaround is to use an intermediate variable `phi_tmp` and pass it to `MPI_Allreduce()` – Gilles Gouaillardet Dec 09 '22 at 09:31
  • Thanks a lot @GillesGouaillardet, but I am ending up with quite a lot of workarounds. It seems that whenever I call an MPI function (for MPICH/Ubuntu 22.04 combination) sending an argument, I have to introduce a little array or a temporary variable. What do you think about mid- to long-terms, will this issue be fixed for Fortran 2008 extensions, or will I have to live with workarounds? – Bojan Niceno Dec 09 '22 at 11:21
  • 1
    I'd ditch MPICH - from what you are saying it just looks totally broken on your box. Why didn't Openmpi work? On my mint box it works with just a simple apt get. Did you try to compile openmpi yourself? And are you compiling mpich yourself? If so in both cases what happens if you just try the packages? – Ian Bush Dec 09 '22 at 11:36
  • @IanBush: I switched to MPICH only because OpenMPI didn't work on my box. No, I didn't try to build it myself, I guess that's the next logical step. Thanks. – Bojan Niceno Dec 09 '22 at 14:19
  • When you looked at OpenMPI did you install the libopenmpi-dev package? That's the one that gives the "header files" and it works on the box I currently am on, which runs Ubuntu 20.04.5. The module seems to be in /usr/lib/x86_64-linux-gnu/openmpi/lib/mpi_f08.mod – Ian Bush Dec 09 '22 at 14:46
  • Note also there is another particularly nasty bug in the gfortran/mpich combination which is described in "https://stackoverflow.com/questions/63824065/lbound-of-an-array-changes-after-call-to-mpi-gatherv-when-using-mpi-f08-module" . As far as I am concerned this is a complete show-stopper. – Ian Bush Dec 09 '22 at 15:09
  • well, the bug you mentionned has been fixed in upstream GCC. In this case, there is something wrong with Ubuntu. The reproducer (from my answer) works if built with `-static-libgfortran`. I also rebuilt vanilla GCC 11.2 and GCC 11.2 with Ubuntu patches (at least I did my best to mimic that) and was unable to reproduce the issue. – Gilles Gouaillardet Dec 10 '22 at 09:21

1 Answers1

5

The root cause is a bug in gfortran provided by Ubuntu 22.04 (jammy).

Here is a sample program that crashes

module mymod
        implicit none
        interface bar
        subroutine bar_f08ts (a) bind(C, name="sync")
        implicit none
        type(*), dimension(..) :: a
        end subroutine
        end interface
end module

module pub
        implicit none

        interface sub
        subroutine pub_f08ts(a)
        implicit none
        type (*), dimension(..) :: a
        end subroutine
        end interface
end module

        subroutine pub_f08ts(a)
        use mymod
        implicit none
        type (*), dimension(..) :: a
        call bar(a)
        end subroutine

subroutine bugsub(a)
        use pub
        implicit none
        real :: a
        call sub(a)
end subroutine

program bug
        implicit none
        real a
        a = 1
        call bugsub(a)
end program
$ gfortran test.f90
$ ./a.out 
Internal Error: Invalid type in descriptor

Error termination. Backtrace:
#0  0x7f27b38c2ad0 in ???
#1  0x7f27b38c3649 in ???
#2  0x7f27b38c3e38 in ???
#3  0x7f27b3b058a4 in ???
#4  0x56281847220b in ???
#5  0x5628184721c4 in ???
#6  0x562818472264 in ???
#7  0x5628184722a0 in ???
#8  0x7f27b36a0d8f in __libc_start_call_main
    at ../sysdeps/nptl/libc_start_call_main.h:58
#9  0x7f27b36a0e3f in __libc_start_main_impl
    at ../csu/libc-start.c:392
#10  0x5628184720c4 in ???
#11  0xffffffffffffffff in ???

I was unable to reproduce this issue on a redhat box with various gfortran version.

The right way to move forward is to report this to Ubuntu and wait for a fix.

Meanwhile, you can either use an other distro, or use Open MPI (that does not use the CFI_desc_t stuff, so the gfortran bug should not impact you). I do not understand how to use the ubuntu packages for openmpi (some are provided, but unless I missed something, the libraries and header files are not available), but you can build and install from source in your home directory (no need root access).

ADDITIONAL INFO

The issue occurs because gfortran-11 uses libgfortran from gfortran-12, and then bad interaction happens.

I reported this to the GNU folks at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108056

Gilles Gouaillardet
  • 8,193
  • 11
  • 24
  • 30
  • 2
    The root cause is an incompatibility in `libgfortran`. Ubuntu have `gfortran-11` uses `libgfortran` from `gcc-12`. This makes sense since the version of `libgfortran` is the same between `gcc-11` and `gcc-12` ... except in this case, these are not compatible. I will report it to the GNU folks first to clarify whether this is a bug (e.g. compatibility should have been preserved) or a "packaging" issue (e.g. compatibility is known to be broken and library version should have been bumped). – Gilles Gouaillardet Dec 10 '22 at 14:28
  • 1
    Thanks for digging into this and filing the bug. – janneb Dec 13 '22 at 07:25