I have a watchfolder where files are dropped. Via a cronjob I start a python script that first checks for new files.
def file_to_process():
filePath = "".join([base_url_gpfs, '/*'])
if glob.glob(filePath + "*.xml"):
set_pid_file()
#find oldest xml file and change into corresponding mxf file
xmlToProcess = min(glob.glob(filePath + "*.xml"), key=os.path.getctime)
fileToProcess = xmlToProcess[:-3] + 'wav'
if not os.path.isfile(fileToProcess):
sys.exit(logger.error("{} not found".format(fileToProcess, filePath)))
return xmlToProcess, fileToProcess
else:
os._exit(0)
If so, it creates a pid file and uploads the file to a cloud service.
def set_pid_file():
if os.path.isfile(pidfile):
logger.info('Process is already running')
os._exit(0)
else:
pid = str(os.getpid())
f = open(pidfile, 'w')
f.write(pid)
When the processing in the cloud is done, I remove the pid file but the script is still running and performing other tasks. At that moment a new instance of the script can start again when there is a new file available. But it seems to lose track somewhere when the script is running multiple times and it fails. So I'm looking for a more reliable way to run different instances of the same script in parallel.