I have a complex HPC (openpbs 20.0.1, Ubuntu 18.04) workflow using a python subprocess to execute qsub which ultimately launches a singularity (3.9.9) container with the exec command to execute a binary with arguments. The complexity is driven by the needs of long-time existing workflows users are dependent on. Below is an exert.
For some unknown reason, the stdout & stderr is not being captured from the binary executed in the singularity container. The other redirects and verbose output from the singularity debug option are captured in the process-output.txt file.
Any thoughts why? Appreciate any input.
def _execute_pbs_jobs()
proc = Popen('ssh ${USER}@server_hostname /opt/pbs/bin/qsub', shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
job_string = """
#!/bin/bash -x
#PBS -N %(name)s
#PBS -l walltime=%(walltime)s
#PBS -q workq
##PBS -j oe
#PBS -k eod
#PBS -l %(processors)s
#PBS -l mem=%(memory)s
#PBS -W umask=66
PROCESS_OUTPUT="${JOBDIR}/process-output.txt"
singularity --debug exec --env "${job_inst_env_variables}" --bind ${JOBDIR}/home:/home,%(context_dir)s,/mnt/pathfinder/logs,/mnt/pathfinder/logs/output/,/mnt/lustre/customers/:/mnt/pathfinder/lustre/customers /apps/containers/%(container_name)s %(binary)s "${JOB_ARGUMENTS_ARRAY[@]}" >> ${PROCESS_OUTPUT} 2>&1
""" % ({"job_names": job_names
,"walltime": walltime
,"processors": total_cores
,"context_dir": self.context_directory
,"binary": self.binary_name
,"name": ''.join(x for x in self.binary_name if x.isalnum())
,"arguments": arguments
,"env_vars": env_variables
,"inst_env_variables": inst_env_variables
,"memory": memory
,"num_jobs": self.get_instance_count()
,"date_string": exec_date.strftime("%Y.%m.%d.%a.%H.%M.%S")
,"qsub_args": qsub_arguments
,"container_name": self.container_image_name
,"debug_mode": debug_mode_arg
})
# Send job_string to qsub
if (sys.version_info > (3, 0)):
proc.stdin.write(job_string.encode('utf-8'))
else:
proc.stdin.write(job_string)
stdout, stderr = proc.communicate()
return(stdout.rstrip())