How to run a Particle Swarm Optimization using Python and Abaqus in Cluster

Question

Regards,

I apologize before hand for the lengthy post:

My question: How do you modify the loop between * * * in errFunction, the runABQfile function (subprocess.call), and the bash script below so that I can run a PSO optimization in a cluster?

The Background: I am calibrating a model using Particle Swarm Optimization (PSO) written in Python and ABAQUS with VUMAT (user material). A python script updates the input files N different ABAQUS models (which correspond to N different experiments) for each iteration and should run each of the N models until the global error between experiments and models is minimized. I am running this optimization in a cluster where I do not have admin privileges.

Assume I have a working main script main.py that import necessary modules, initiates variables, read the experimental data before calling a function PSO.py using

XOpt, FOpt = pso(errFunction, lb, ub, f_ieqcons=mycons, args=args)

The target function errFunction to be minimized is to run all N models using the runABQfile function and return the global error each iteration to the PSO function. A brief view of the structure of my code is shown below (I left out parts that are not relevant).

    def errFunction(param2Calibrate,otherProps,InputFiles,experimentData,otherArgs):
        maxNpts   = otherArgs[0]
        nAnalysis = otherArgs[1]

        # Run Each Abaqus Simulation
        inpFile   = [0] * nAnalysis
        abqDisp   = [[0 for x in range(maxNpts)] for y in range(nAnalysis)]
        abqForce  = [[0 for x in range(maxNpts)] for y in range(nAnalysis)]
        iexpForce = [[0 for x in range(maxNpts)] for y in range(nAnalysis)]

        # ***********************************#
        # - Update and Run Each Input File - #
        for r in range(nParallelLoops):
            for k in range( r*nAnalysis/nParallelLoops, (r+1)*nAnalysis/nParallelLoops ):
                # - Write and Run Abaqus INP file - #
                inpFile[k] = writeABQfile(param2Calibrate,otherProps[k],InputFiles[k])
                runABQfile(inpFile[k])
                # - Extract from Abaqus ODB - #
                abqDisp_, abqForce_ = extraction(inpFile[k])
                abqDisp[k][0:len(abqDisp_)]   = abqDisp_
                abqForce[k][0:len(abqForce_)] = abqForce_

        # ***********************************#

        # - Interpolate Experimental Results to Match Abaqus - #
        for k in range(0,nAnalysis):
            iexpForce_ = interpolate(experimentData[k],abqDisp[:][k])
            iexpForce[k][0:len(abqDisp_)]= iexpForce_

        # - Get Error - #
        for k in range(0,nAnalysis):
            Err[k] = Error(iexpForce[:][k],abqDisp[:][k],abqForce[:][k])
    return Err

And the runABQfile is setup as follow, where 2 processes are to run in serie:

    def runABQfile(inpFile):    
        import subprocess
        import os

        # - Run Abaqus - #
        ABQexe = '/opt/abaqus/6.14-1/code/bin/abq6141'
        prcStr1 = (ABQexe+' '+'job='+inpFile+' input='+inpFile+'                  \ 
                   user=$HOME/mPDFvumatNED.f scratch=/scratch/$USER/$SLURM_JOBID  \
                   cpus=12 parallel=domain domains=12 mp_mode=mpi memory=60000mb  \
                   interactive double=both')

        prcStr2 = (ABQexe+' '+'cae noGUI='+inpFile+'_CAE.py')

        process = subprocess.call(prcStr1,stdin=None,stdout=None,stderr=None,shell=True)
        process = subprocess.call(prcStr2,shell=True)

Where the problem seem to be: I have access to maximum 2 nodes with 24 cpus per job (restricted by # of ABAQUS licenses). If I were to run a single analysis, I'd queue the job using SLURM with the following script.

    #!/bin/bash
    #SBATCH --job-name="abaqus"
    #SBATCH --output="abaqus.%j.%N.out"
    #SBATCH --partition=debug
    #SBATCH --nodes=2
    #SBATCH --export=ALL
    #SBATCH --ntasks-per-node=24
    #SBATCH -L abaqus:25
    #SBATCH -t 00:30:00

    #Get the env file setup 
    scontrol show hostname > file-list1
    scontrol show hostlist > file-list2
    HOST1=`sed -n '1p' file-list1`
    HOST2=`sed -n '2p' file-list1`
    cat abq_v6.env |sed -e "s/host1/$HOST1/g" > ttt1.env
    cat ttt1.env | sed -e "s/host2/$HOST2/g" > abaqus_v6.env
    rm ttt*env

    #Run the executable remotely
    sed "s/DUMMY/$SLURM_JOBID/g" s4b.sh.orig > s4b.sh
    chmod u+x s4b.sh
    export EXHOST=`/bin/hostname`
    ssh $EXHOST $SLURM_SUBMIT_DIR/s4b.sh

where s4b.sh.orig looks like this:

    #!/bin/bash -l
    cd /share/apps/examples/ABAQUS/s4b_multinode
    module purge
    module load abaqus/6.14-1
    export EXE=abq6141
    $EXE job=s4b scratch=/scratch/$USER/DUMMY cpus=48 -verbose 3 \
     standard_parallel=all mp_mode=mpi memory=120000mb interactive

This script setup is the only way to to submit one ABAQUS job that runs on multiple nodes on that cluster because of problems with the ABAQUS environment file and SLURM (my guess the mp_host_list is not being properly assigned or it is oversubscribed, but honestly I do not understand what could be going on).

I modified my runABQfile function to use the bash construct when calling subprocess.call to something like this:

    prcStr1 = ('sed "s/DUMMY/$SLURM_JOBID/g" s4b.sh.orig > s4b0.sh; \ 
               sed "s/MODEL/inpFile/g" s4b0.sh > s4b1.sh;           \ 
               chmod u+x s4b1.sh;                                   \
               export EXHOST=`/bin/hostname`;                       \
               ssh $EXHOST $SLURM_SUBMIT_DIR/s4b1.sh' )   
    process = subprocess.call(prcStr1,stdin=None,stdout=None,stderr=None,shell=True)

But the optimization never starts and quits right after modifying the first script.

Now the question again is How do you modify the loop between * * * in errFunction, the runABQfile function (subprocess.call), and the bash script so that I can run this optimization?... I would like to use at least 12 processors per ABAQUS model that is potentially running 4 jobs at the same time. Keep in mind all N models need to run and finish before moving to the next iteration.

I will appreciate any help you guys could provide.

Sincerely,

D P.

What is exactly your problem? Subprocess calls don't start a new process or you're not completely sure how to correctly schedule a job? — hgazibara, May 10 '16 at 06:20
@hgazibara, Yes, the question would be - _how to schedule the job?_ properly, knowing what I want to do with python and abaqus. I realize changes would need to be made in both python script in the loop between * * * of the function `errFunction`, the `subprocess.call` instruction in `runABQfile`, and the bash script. — David P., May 10 '16 at 11:32

How to run a Particle Swarm Optimization using Python and Abaqus in Cluster

0 Answers0