0

I've got an old and messy Fortran program that uses MPI. It has one small module written in C, which tries to determine the largest allocatable block in the memory by calling malloc() iteratively until it returns null, and then returns a largest successful allocation size to a Fortran program.

When I compile it using gfortran, it works well, but when I try to use mpif90, the last malloc() causes segfault instead of returning null.

Here's the smallest illustrative example with no actual MPI code. File main.f:

program test
    complex(8) :: sig(256000000) ! Just allocating some big array in fortran
    sig(1) = 0.d0                ! and now wondering how much space is left?
    call bigalloc
end

File bigalloc.c

#include <stdlib.h>
#include <stdio.h>

void bigalloc_() {
    size_t step = 0x80000000;
    size_t size = 0;
    int failed = 0;
    void* p;
    do {
        size += step;
        p = malloc(size);
        if (p) {
            free(p);
            printf("Allocated %zd...\n", size);
        } else {
            printf("So, that's our limit\n");
            failed = 1;
        }
    } while (!failed);
}

Compile and run using just gfortran (works as expected):

~$ gcc -c bigalloc.c -o bigalloc.o && gfortran -o main main.f bigalloc.o && ./main
Allocated 2147483648...
Allocated 4294967296...
So, that's our limit

Compile with MPI and run (fails):

~$ gcc -c bigalloc.c -o bigalloc.o && mpif90 -o main main.f bigalloc.o && ./main
Allocated 2147483648...
Allocated 4294967296...
Segmentation fault

Replacing gcc with mpicc changes nothing here. When main is also written in C and compiled using mpicc everything is also OK. So problem is just with Fortran.

The output of mpif90 -show is here. The problem depends solely on the presence of -lopen-pal option.

$ mpif90 -show
gfortran -I/usr/include/openmpi/1.2.4-gcc/64 -I/usr/include/openmpi/1.2.4-gcc -m64 -pthread -I/usr/lib64/openmpi/1.2.4-gcc -L/usr/lib64/openmpi/1.2.4-gcc -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl

It seems that while linking MPI substitutes a stnadard malloc with its own one from PAL, which doesn't work properly on exceptions. Is there a way of getting around it (e. g. by somehow linking my bigalloc.c with glibc statically)?

Serge Ts
  • 13
  • 3

1 Answers1

0

Open MPI intercepts malloc() and free() calls through glibc-provided hooking mechanisms so that it can track dynamically allocated and deallocated buffers. It does that since RDMA-based networks like InfiniBand require that communication buffers are pinned (i.e. made unmovable) in physical memory so that the hardware can access them. Registering and de-registering memory (the process of pinning and unpinning it) takes quite some time and that's why the library simply does not unpin already pinned memory in hope that it would be reused (that's why dynamically allocating each communication buffer is more than very bad idea). But this could cause problems if memory was dynamically allocated, then registered and then deallocated. That's why Open MPI hooks the malloc/free API and tracks the dynamic allocations. Open MPI can also track memory using MMU notifications if the hardware supports it and the library was built accordingly.

Memory hooking, at least in newer Open MPI releases, could be disabled by setting the memory_linux_disable MCA parameter to 1. Unlike all other MCA variables, this one can only be set via the environment, e.g. one has to set the environment variable OMPI_MCA_memory_linux_disable to 1 and export it. Don't do that if the program would be running on a cluster with InfiniBand or other RDMA-based network! Unfortunately you are running an ancient version of Open MPI that does not recognise this MCA parameter and the memory_hooks module doesn't seem to provide a mechanism by which it could be disabled. You should really ask on the Open MPI Users mailing list.

Also note that Linux kernels overcommit memory by default, i.e. they allow virtual memory allocations that exceed the physical memory size. The net result is that malloc() would succeed with much larger memory sizes than the available physical memory size but if you (or any Open MPI hook procedure) try to use that memory, at some point the physical memory would be depleted and the process would receive SIGSEGV.

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186