Run R code in parallel in a shell without having R file

Question

I've got the following .sh file which can be run on a cluster computer using sbatch:

Shell.sh

#!/bin/bash
#
#SBATCH -p smp # partition (queue)
#SBATCH -N 2 # number of nodes
#SBATCH -n 2 # number of cores
#SBATCH --mem 2000 # memory pool for all cores
#SBATCH -t 5-0:00 # time (D-HH:MM)
#SBATCH -o out.out # STDOUT
#SBATCH -e err.err # STDERR

module load R
srun -N1 -n1 R CMD BATCH ./MyFile.R &
srun -N1 -n1 R CMD BATCH ./MyFile2.R &
wait

My problem is that MyFile.R and MyFile2.R almost look the same:

MyFile.R

source("Experiment.R")
Experiment(args1) # some arguments

MyFile2.R

source("Experiment.R")
Experiment(args2) # some arguments

In fact, I need to do this for about 100 files. Since they all load some R file and then run the experiment with different arguments, I was wondering whether I could do this without creating a new file for each run. I want to run all processes in parallel, so I can't just create one single R file, I think.

My question is: is there some way to run the process directly from the shell, without having an R file for each run? So can I do something like

srun -N1 -n1 R cmd BATCH 'source("Experiment.R"); Experiment(args1)' &
srun -N1 -n1 R cmd BATCH 'source("Experiment.R"); Experiment(args2)' &
wait

instead of the last three lines in shell.sh?

Did you consider passing your arguments as the script's argument? http://tuxette.nathalievilla.org/?p=1696 — Deena, May 05 '18 at 12:13
Yes but I did not find out a way in Rscript to run these processes in parallel, which in this case means that, I can assign one node (computer) and one core to each Rscript — Roshan Mahes, May 05 '18 at 13:04

Katia · Accepted Answer · 2018-05-05T17:01:51.617

Your batch script should still include 2 lines to start 2 different R processes, but you can pass the arguments on command line using the same file name:

module load R
srun -N1 -n1 Rscript ./MyFile.R args1_1 args1_2 &
srun -N1 -n1 Rscript ./MyFile.R args2_1 args2_2 &
wait

Then within your R file:

source("Experiment.R")
#Get aruments from the command line
argv <- commandArgs(TRUE)

# Check if the command line is not empty and convert values if needed
if (length(argv) > 0){
   nSim <- as.numeric( argv[1] )
   meanVal <- as.numeric( argv[2] ) 
} else {
   nSim=100  # some default values
   meanVal =5
}

Experiment(nSim, meanVal) # some arguments

If you prefer to use R command instead of Rscript, then your batch script should look like:

module load R
srun -N1 -n1 R -q --slave --vanilla --args args1_1 args1_2 < myFile.R &
srun -N1 -n1 R -q --slave --vanilla --args args2_1 args2_2 < myFile.R &
wait

You might need (or not) quotes for "R -q --slave ... < myFile.R" part

Thanks! This is exactly what I was looking for! – Roshan Mahes May 05 '18 at 17:20 — Roshan Mahes, May 05 '18 at 17:20

Run R code in parallel in a shell without having R file

1 Answers1