0

I am working on an older cluster that I am not the administrator of and is in locked configuration which is causing me some issues. The system uses the original mpich and the cluster script is written in Perl using parallel::mpi for the runs. This Monte Carlo script generates 5000 test cases to be run that are then launched on the cluster. I looked at the original code and it was taking around 500 (not 5000) of the test and putting them in three files. The files then passed them to the cluster around 260 max at a time. I asked the system administrator if he knew why the programmer had done this and he said it was because mpich(1) would not allow more than 260 jobs being sent at a time. I am not sure if that was an mpich1 thing or a parallel::mpi thing.

So I rewrote the Perl program to generate 19 files with around 250 cases for each of the files to get all 5000 cases to run. My question is I usually have one file which I run and launch with a pbs_mpirun command. The original program had three separate launch pbs files. So now I have 19. Can I launch them all from the same file? Do I have to put some type of sleep between the mpirun commands? The way the cluster queues are set up only one user can be running one job on the same queue at a time. So if I launched to launch multiple runs to queue n64 only one would run at a time which is fine but I don't want to have to submit 19 runs and fill up the qstat list to complete one monte carlo if I don't have to.

This may be something common but I just have never dealt with so any advice would be appreciated. Below is my PBS file which launches the first Perl cluster files. The Perl cluster files are mpi_wokernode_1.pl - mpi_workernode_19.pl.

    #!/bin/sh
    ###Lines starting with "# " are comments, batch system configuration
    ###commands starting with "#PBS" 
    #PBS -l walltime= 12:00:00
    #PBS -N MONTE
    ### Declare job non-rerunable
    #PBS -r n
    ### Output files (overwritten in successive jobs)
    #PBS -e system1:/filearea
    #PBS -o system1:/filearea
    ### Return error and output on output stream
    #PBS -j oe
    ### Queue name (small, medium, long, verylong)
    #PBS -q n64@old_cluster
    #PBS -l select=64:ncpus=1
    #PBS -l place=free
    ##PBS -m e
    #PBS -W group_list=groupa

    cd /filearea
    # Count all available processors 
    NPROCS=`grep -v "\#" $PBS_NODEFILE | wc -l` 
    pbs_mpirun mpi_workernode_1.pl
    pbs_mpirun mpi_workernode_2.pl
Carole
  • 117
  • 9

1 Answers1

2

This sounds like an issue that's pretty specific to your system, so it might be hard to get useful advice here. However, if you have a home directory on the machine, you can usually install your own MPI in there and launch that. You'll just add --prefix=$HOME/<path to install> to your ./configure line and you should be ready to go. You'll probably need to modify your PBS script so it uses your MPI instead of the default one. That's probably just combining the last two lines to look like:

/path/to/mpiexe -n <num_procs> /path/to/mpi_program

This assumes a couple of this.

  1. You have some sort of NFS sharing set up for your home directory. Without this, you'll have to copy the MPI executables to all of the nodes in your system, which is a pain.
  2. You have access to the original MPI program and can execute it directly without your wrapper script. This will make the entire process easier if you do.
  3. Your system isn't doing some nastiness that prevents you from running your own MPI. I have used systems in the past that made it difficult / impossible to replace the default MPI library with your own. Your system probably isn't like that, but you'll have to experiment to find out.
Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
  • 1
    This reads like a recipe for raising OP's confusion by an order of magnitude. Make a private installation of MPI ? How does that either answer the question or make it easier to run the large number of jobs that seem to be required ? Going round the existing job management system won't magic up another cluster, it's more likely to load the existing cluster with work that the current job management system doesn't know about. I can't see that raising the system's throughput. – High Performance Mark Aug 16 '13 at 14:16
  • If someone can figure out the solution to the OP's problem, they're welcome to submit another answer. This answer is a valid one that you disagree with. I agree that it's not a great solution, but if you don't have admin access to update the existing installation and you need to have newer features than MPICH 1.0, this is a way to solve the problem. You're welcome to vote it down. – Wesley Bland Aug 16 '13 at 14:25
  • Wesley, Thanks for the input. I do wish I could just upgrade the mpich all together. mpich in my home directory is not an option though. This script is going to be run by a bunch of people from a different team so linking to my home directory isn't a workable situation. Luckily, this old cluster won't be set up this way for two long as I am going to get to convert it into a testing cluster. For now though I am stuck with what I have. I just had not seen anyone do multiple mpirun calls from the same pbs script. I did not know if that was common and maybe I was just living a sheltered life :-> – Carole Aug 16 '13 at 14:59
  • I don't think anything bad will happen if you do multiple calls from the same PBS script. Give it a try and report back. – Wesley Bland Aug 16 '13 at 18:29