f2py and MPI Pool: Child processes continue when parent job is deleted on cluster

Question

I am running a 30-core Python job on a cluster using MPIPool. When I delete the job through the ubiquitous qdel <job ID> command, only the parent is killed, while the child processes continue to run. In other words: qdel makes the job-ID disappear from the queue, but the 30 (= number of cores) initiated Python processes remain present in the background, contributing heavily to the cluster load. Furthermore, I can only ''manually'' kill the background processes on the one node I am logged into.

Another thing that complicates matters is the fact that my Python script calls on a piece of Fortran code (I am using the f2py module to chieve this). I have noticed in the past, when running the programme locally, that that Fortran does not respond to a Ctrl+C interrupt. The programme is aborted once it arrives at the Python layer again.

I have consulted the documentation relating to MPIPool, which I use to parallelise the job, but I did not manage to pinpoint where exactly things go wrong. Ideally, I would like a child process to call on its parent regularly and to terminate itself when it notices that the parent is no longer there. At the moment it seems that deleting the job simply cuts the rope that ties parent and child together, without deleting the child.

The snippet below shows how the pool object is integrated my main code. In addition I use a bash script to submit a job to the cluster queue (containing echo 'mpirun -np '$NCORES' python '$SKRIPTNAME >> $TMPFILE) and request the number of cores I want to use. The latter should work fine.

import emcee 
from emcee.utils import MPIPool

pool = MPIPool()

    if not pool.is_master():
        pool.wait()
        sys.exit(0)

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, pool = pool) 
pos, prob, state = sampler.run_mcmc(p0, 1000) # p0 contains the initial walker positions

pool.close()

Background: I use the emcee module to carry out a Monte Carlo Simulation. lnprob is a likelihood function that is evaluated for the parameter set being drawn in a particular iteration. lnprob calls on a Fortran script that handles the computationally expensive parts.

Edit: Please find below a minimal script for which the issue still occurs. I have been able to verify that f2py is apparently not causing the problems:

import numpy as np
import sys
import emcee
from emcee.utils import MPIPool

def calc_log_prob(a,b,c,d):

    for i in np.arange(1000):
        for j in np.arange(1000):
            for k in np.arange(1000):
                for g in np.arange(1000):
                    x = i + j + k + g

    return -np.abs(a + b) 

def lnprob(x):
    return calc_log_prob(*x)

ndim, nwalkers = 4, 180

p0 = [np.array([np.random.normal(loc = -5.5, scale = 2., size=1)[0], \
            np.random.normal(loc = -0.3, scale = 1., size=1)[0], \
            0.+3000.*np.random.uniform(size=1)[0], \
            -6.+3.*np.random.uniform(size=1)[0]]) for i in range(nwalkers)]

with MPIPool() as pool:

    if not pool.is_master():
        # Wait for instructions from the master process.
        pool.wait()
        sys.exit(0)

    sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, pool = pool)

    pos, prob, state = sampler.run_mcmc(p0, 560)

pool.close()

This script closely follows the example outlined in the emcee documentation, with pool correctly incorporated. To be honest, I am completely clueless as to where the source of this malfunctioning hides. I am almost inclined to say that the issue is more cluster-related.

are the remaining processes `MPI` apps ? (e.g. they called `MPI_Init()`, directly or via `mpi4py`) or are these non MPI processes that were fork&exec'ed via the MPI apps (or launched directly via ssh) ? — Gilles Gouaillardet, Feb 12 '18 at 03:46
The remaining processes are just ``Python`` processes. Furthermore, I'm using the ``MPI`` library that was initiallty part of ``emcee``, but that has now been moved to ``schwimmbad``: https://github.com/adrn/schwimmbad/blob/master/schwimmbad/mpi.py — stroopwafel, Feb 12 '18 at 10:58
this file is based on `mpi4py` which is itself built on top of a "real" MPI implementation (such as mpich, Open MPI or other commercial libs). how these processes were spawned ? — Gilles Gouaillardet, Feb 12 '18 at 11:54
Uhm, well, I submit the job by typing `` ``. Within this (parent) Python script, ``MPI`` should handle everything, as indicated in my post... — stroopwafel, Feb 12 '18 at 14:16
when the app is running, try `ps -ef --forest`, that should tell you how each process was spawned. — Gilles Gouaillardet, Feb 12 '18 at 14:45
Sadly, I cannot retrieve this information at te moment, as the processes are not running on the log-in node. — stroopwafel, Feb 12 '18 at 17:09
Which MPI library and which resource manager are you running ? Was MPI built with support for this batch manager ? — Gilles Gouaillardet, Feb 13 '18 at 12:28
Concerning the ``ps -ef --forest`` command: it says ``-tcsh`` --> ``/bin/tcsh -f /var/lib/torque/mom_priv/jobs/...`` --> ``mpirun -np 20 python second_test.py`` --> ``/usr/lib64/mpich/bin/hydra_pmi_proxy --control-port`` --> ```` — stroopwafel, Feb 14 '18 at 13:40

f2py and MPI Pool: Child processes continue when parent job is deleted on cluster

0 Answers0