I'm using snakemake
to build a variant calling pipeline that can be run on a SLURM cluster. The cluster has login nodes and compute nodes. Any real computing should be done on the compute nodes in the form of an srun
or sbatch
job. Jobs are limited to 48 hours of runtime. My problem is that processing many samples, especially when the queue is busy, will take more than 48 hours to process all the rules for every sample. The traditional cluster execution for snakemake
leaves a master thread running that only submits rules to the queue after all the rule's dependencies have finished running. I'm supposed to run this master program on a compute node, so this limits the runtime of my entire pipeline to 48 hours.
I know SLURM jobs have dependency directives that tell a job to wait to run until other jobs have finished. Because the snakemake
workflow is a DAG, is it possible to submit all the jobs at once, with each job having its dependencies defined by the rule's dependencies from the DAG? After all the jobs are submitted the master thread would complete, circumventing the 48 hour limit. Is this possible with snakemake
, and if so, how does it work? I've found the --immediate-submit
command line option, but I'm not sure if this has the behavior I'm looking for and how to use the command because my cluster prints Submitted batch job [id]
after a job is submitted to the queue instead of just the job id.