2

I am creating a compute cluster in python using dispy. One of my use cases would be very nicely solved by starting a process on a compute node that itself starts a distributed process. As such, I have implemented the SharedJobCluster on the primary scheduler, and also in the function that will be sent to the cluster (which should in turn, start a series of distributed processes). However, when the second SharedJobCluster is initiated, the code hangs and does not move past this line (nor show any errors).

Minimum working example:

def clusterfun():
    import dispy
    import test2

    import logging
    log_filename = 'worker.log'
    logging.basicConfig(filename=log_filename,
                        level=logging.DEBUG,
                        format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
                        datefmt='[%m-%d-%Y %H:%M:%S]')

    logging.info("Starting cluster...")

    # THE FOLLOWING LINE HANGS
    cluster = dispy.SharedJobCluster(test2.clusterfun2, port=0, scheduler_node='127.0.0.1') 

    logging.info("Started cluster...")

    job = cluster.submit()

    logging.info("Submitted job...")

    return job()


if __name__ == '__main__':

    import dispy

    #
    # Start the Compute cluster
    #
    cluster = dispy.SharedJobCluster(clusterfun, port=0, depends=['test2.py'], scheduler_node='127.0.0.1')

    job = cluster.submit()

    print(job())

test2.py contains:

def clusterfun2():

    return "Foo"

For reference, I am currently running the dispyscheduler.py, dispynode, and this python code all on the same machine. This setup works, except when trying to initiate embedded distribution task.

The worker.log output contains "Starting cluster..." but nothing else.

If I check the status of the node it says that it is running 1 job, but it never completes.

0 Answers0