1

I have a watchfolder where files are dropped. Via a cronjob I start a python script that first checks for new files.

def file_to_process():
    filePath = "".join([base_url_gpfs, '/*'])
    if glob.glob(filePath + "*.xml"):
        set_pid_file()
        #find oldest xml file and change into corresponding mxf file
        xmlToProcess = min(glob.glob(filePath + "*.xml"), key=os.path.getctime)
        fileToProcess = xmlToProcess[:-3] + 'wav'
        if not os.path.isfile(fileToProcess):
            sys.exit(logger.error("{} not found".format(fileToProcess, filePath)))
        return xmlToProcess, fileToProcess
    else:
        os._exit(0)

If so, it creates a pid file and uploads the file to a cloud service.

def set_pid_file():
    if os.path.isfile(pidfile):
        logger.info('Process is already running')
        os._exit(0)
    else:
        pid = str(os.getpid())
        f = open(pidfile, 'w')
        f.write(pid)

When the processing in the cloud is done, I remove the pid file but the script is still running and performing other tasks. At that moment a new instance of the script can start again when there is a new file available. But it seems to lose track somewhere when the script is running multiple times and it fails. So I'm looking for a more reliable way to run different instances of the same script in parallel.

user3666197
  • 1
  • 6
  • 50
  • 92
  • There are a lot of ways to approach this, but perhaps the easiest modification *to what you are already doing* would be to instead create a pid file (essentially a lock) per input file, rather than a lock for the process overall. You could signal that a few ways as well...rename the files as you start processing them, move them to a new "in process" folder, or drop a lock file unique to each file (perhaps based on the name of the file). Then multiple cron jobs should be able to run at once and not interfere. – totalhack Feb 24 '20 at 17:29

0 Answers0