Questions tagged [pbs]

PBS stands for portable batch system, and describes a family of software products for high performance computing.

PBS stands for portable batch system, and describes a family of software products for high performance computing. The software is a resource manager that is used to manage jobs, including their submission, running, and basic managing. They are often used in UNIX cluster environments and can often be used with schedulers.

Modern PBS daemons are descendants from OpenPBS; two notable descendants are TORQUE and PBSPro. TORQUE is an open source product and is maintained by Adaptive Computing. More information can be found on its wikipedia page. Documentation for TORQUE can be found on Adaptive's website. PBSPro is a commercial product that is developed by Altair Engineering. PBSPro's user guide can be found here.

428 questions
2
votes
1 answer

'cd' command not working from within PBS script

This is driving me crazy. The PBS script below works fine except that for the cd command. If the line cd $PBS_O_WORKDIR is uncommented, the process is running forever on the cluster. #PBS -lnodes=1:ppn=8 #PBS -lwalltime=48:00:00 #PBS -S…
Roland
  • 427
  • 1
  • 4
  • 15
2
votes
1 answer

Is Celery a "real" scheduler like PBS, MESOS or YARN?

I have an existing application that is using Celery. Clients submit tasks to Celery, and Celery's workers unstack those tasks and run it, accross different physical hosts. Each Celery worker run a task at once. A given physical host has multiples…
Klun
  • 78
  • 2
  • 25
2
votes
2 answers

Using GNU Parallel etc with PBS queue system to run more than 2 or more MPI codes across multiple nodes as a single job

I am trying to run more than 1 MPI codes (eg. 2) in PBS queue system across multiple nodes as a single job. E.g. For my cluster, 1 node = 12 procs I need to run 2 codes (abc1.out & abc2.out) as a single job, each code using 24 procs. Hence, I need…
quarkz
  • 111
  • 7
2
votes
0 answers

qsub pbs doesn't show error and output log, even forcing the path

when I run code using qsub and pbs script, the log and error files are not shown. I have also tried to add the path of error and log file, but without succes #PBS -N example_job #PBS -j oe #PBS -q shortp #PBS -V ##PBS -v BATCH_NUM_PROC_TOT=16 #PBS…
frank
  • 89
  • 5
2
votes
1 answer

Make current file & the extension with condor match output & error files? (to have PBS and Slurm have same output files)

How do I make condor name my files as follow: meta_learning_experiments_submission.py.e451863 meta_learning_experiments_submission.py.o444375 $(FILENAME).e$(CLUSTER) $(FILENAME).e$(CLUSTER) I tried it but it doesn't seem to work. e.g. so that it…
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
2
votes
1 answer

Julia parallel processing on PBS multiple nodes

I am looking for a way to run simple parallel processes (one function run multiple times with different arguments, no communication between process) across multiple nodes in a PBS cluster. Currently I am able to run it on a single node setting the…
tidus95
  • 359
  • 2
  • 14
2
votes
1 answer

Multiprocessing on PBS cluster node

I have to run multiple simulations of the same model with varying parameters (or random number generator seed). Previously I worked on a server with many cores, where I used python multiprocessing library with apply_async. This was very handy as I…
tidus95
  • 359
  • 2
  • 14
2
votes
1 answer

MPI_Comm_spawn fails with "All nodes which are allocated for this job are already filled"

I'm trying to use Torque's (5.1.1) qsub command to launch multiple OpenMPI processes, one process per node, and having each process launch a single process on its own local node using MPI_Comm_spawn(). MPI_Comm_spawn() is reporting: All nodes…
Kurt
  • 87
  • 7
2
votes
0 answers

PBS Pro: setting job array slot limit by the user

Using torque user can specify slot limit when submitting the job array by using the %, e.g.: qsub job.sh -t 1-20%5 will create a job array with 20 jobs, but with only 5 running simultaneously. Currently I work with PBS Professional, but…
andywiecko
  • 306
  • 3
  • 16
2
votes
1 answer

Unable to run PBS script on multiple nodes using GNU parallel

I have been trying to use multiple nodes in my PBS script to run several independent jobs. Each individual job is supposed to use 8 cores and each node in the cluster has 32 cores. So, I would like to have each node run 4 jobs. My PBS script is as…
tobiuchiha
  • 73
  • 1
  • 7
2
votes
1 answer

How can I restart a failed PBS job in cluster (qsub)?

I'm running a PBS job (python) in the cluster using qsub command. I'm curious to know how can I restart the same job from the step where it failed? Any type of help will be highly appreciated.
user1410665
  • 719
  • 7
  • 23
2
votes
0 answers

PBS jobs vs PBS job-arrays

What is the difference between submitting individual jobs as PBS scripts and submitting them as a single PBS-array? (I am getting a significant run-time improvement for the latter)
2
votes
1 answer

'sed' command does not work normally in PBS scripts

I use torque to submit the test script shown blow #!/bin/bash #PBS -N test #PBS -l nodes=1:ppn=1 #PBS -q ser #PBS -V #PBS -S /bin/bash sed 's/a//' <<< aaabbbaaa sed 's/\(a\)//' <<< aaabbbaaa sed 's/a\+//' <<< aaabbbaaa The expect output should be…
Dizzam
  • 23
  • 1
  • 4
2
votes
2 answers

multiple qsub commands simultaneously

I am using my department's computing cluster with Sun Grid Engine. When I have to run multiple R jobs, I usually write shell script files with names s01.sh, s02.sh,...,s50.sh which have 'R CMD BATCH r01.r','R CMD BATCH r02.r',...,'R CMD BATCH…
user67275
  • 1
  • 9
  • 38
  • 64
2
votes
2 answers

Exclude (or include) specific nodes in PBS Pro

I'm working on a cluster with 8 nodes; 4 nodes have python and 4 don't. How can I ensure that my python jobs only go to the nodes with python? I do not have admin rights on the cluster PBS Pro 13.1 RedHat 5.11 This question has been asked before,…
jerrytown
  • 21
  • 2
  • 4