Running pexpect subprocesses in background

Question

I have the below code that I am running

try:
    child = pexpect.spawn(
        ('some command --path {0}  somethingmore --args {1}').format(
            <iterator-output>,something),
        timeout=300)
    child.logfile = open(file_name,'w')
    child.expect('x*')
    child.sendline(something)
    child.expect('E*')
    child.sendline(something))
   #child.read()
    child.interact()
    time.sleep(15)
    print child.status
except Exception as e:
    print "Exception in child process"
    print str(e)

Now, the command in pexpect creates subprocess by taking the one of the input from a loop, now everytime it spins up a subprocess I try to capture the logs via the child.read, in this case it waits for that subprocess to complete before going to the loop again, how do I make it to keep running it in the background(I get the logs of command input/output that I enter dynamically, but not of the process that runs thereafter unless I use the read or interact? I used this How do I make a command to run in background using pexpect.spawn? but it uses interact which again waits for that subprocess to complete .. since the loop will be iterated alomst more than 100 times I cannot wait on one to complete before moving to other, as the command in pexpect is an AWS lambda call, all I need to make sure is the command is triggered but I am not able to capture the process output of that call without waiting for it to complete.... Please let me know your suggestions

abarnert · Answer 1 · 2018-03-14T18:51:31.513

0

If you want to run a process in the background, but at the same time interact with it, the simplest solution is to just kick off a thread to interact with the process.^*

In your case, it sounds like you're running hundreds of processes, so you want to run some of them in parallel, but maybe not all of them at once? If so, you should use a thread pool or executor. For example, using concurrent.futures from the stdlib (or pip install the futures backport if your Python is too old):

def run_command(path, arg):
    try:
        child = pexpect.spawn(('some command --path {0}  somethingmore --args {1}').format(path, arg), timeout=300)
        child.logfile = open(file_name,'w')
        child.expect('x*')
        child.sendline(something)
        child.expect('E*')
        child.sendline(something))
        # child.read()
        child.interact()
        time.sleep(15)
        print child.status
    except Exception as e:
        print "Exception in child process"
        print str(e)

with concurrent.futures.ThreadPoolExecutor(max_workers=8) as x:
    fs = []
    for path, arg in some_iterable:
        fs.append(x.submit(run_command, path, arg))
    concurrent.futures.wait(fs)

If you need to return a value (or raise an exception) from the threaded code, you'll probably want a loop over as_completed(fs) instead of just plain wait. But here, you just seem to be printing stuff out and then forgetting it.

If the path, arg really do come straight out of an iterable like this, it's usually simpler to use x.map(run_command, some_iterable).

All of this (and other options, too) is explained pretty nicely in the module docs.

Also see the pexpect FAQ and common problems. I don't think there are any issues that will affect you here in current versions (we're always spawning the child and interacting with it entirely in a single thread-pooled task), but I vaguely remember there used to be an additional problem in the past (something to do with signals?).

_{* I think asyncio would be a better solution, except that as far as I know none of the attempts to fork or reimplement pexpect in a nonblocking way are complete enough to actually use…}

edited Mar 14 '18 at 18:51

answered Mar 14 '18 at 18:41

abarnert

354,177
51
601
671

Thanks for your prompt reply, but it still does not run in the background, it waits for one to complete before proceeding to other. Also, on your question "In your case, it sounds like you're running hundreds of processes, so you want to run some of them in parallel, but maybe not all of them at once?" ,running them parallely would be the solution but since then there would be a large number of processes ,all I am looking to implement is that as soon as it spinsup the subprocess it logs the interactive output to log file instead of waiting there ... – dheeraj tripathi Mar 14 '18 at 19:32
@dheerajtripathi First, with the `Executor` code, it should be running 8 of these at once. It will wait for one of the 8 to finish before starting another one; that's the only way to ensure it runs only 8 at a time instead of all of them simultaneously. If you really want to run all of them simultaneously, you can just drop the `Executor` and spawn a thread for each one. If you want that, and don't understand how to do it, I can write some sample code for it. – abarnert Mar 14 '18 at 19:38
@dheerajtripathi Second, I'm not sure I understand the last part of your comment. Are you saying you need to interact with the process initially, but after that you can just dump it off to the background, ignore what it does, and start up the next process? – abarnert Mar 14 '18 at 19:39
"Are you saying you need to interact with the process initially, but after that you can just dump it off to the background, ignore what it does, and start up the next process?" was thinking of that , but I realise it could get ugly ? – dheeraj tripathi Mar 14 '18 at 19:57
"On the coment First, with the Executor", 8 executors at once have a bit of confusion there as never used executor, the code runs 8 of these threads at once, but the subprocess is one , so isn't it that 1 thread is being used, now in this case if I still need to wait for the interactive process to complete before the control goes over to spin the new subprocess, then isn't it still the same old way w/o executor? Really stuck understanding on how the Executor is helping me here? will it reduce my execution time (as it still seems to be not proceeding until completion of one, appreciate ur help – dheeraj tripathi Mar 14 '18 at 20:02
@dheerajtripathi The executor runs 8 threads at once. Each of those threads runs one subprocess. So there are 8 subprocesses running at any time. Sure, the 8 threads are all wasting most of their time waiting 15 seconds, but there are 8 of them doing it at once If you have 200 to run, and each one takes 15 seconds, your total time should go from 3000 seconds to just over 375 seconds. – abarnert Mar 14 '18 at 20:07
@dheerajtripathi Meanwhile, dumping them all in the background after the initial interaction and then dealing with them all serially could be ugly (especially if you mix it with threads for spawning them faster), but it may be worth the ugliness. If it takes 1 second to do the initial interactive work, then 14 seconds to wait, it could take 214 seconds, or 37 seconds if you mix the two solutions. But, on the other hand, it could be too fragile or complicated to use. – abarnert Mar 14 '18 at 20:10
Probably the last question :), in my use case am I not just using one process? so that is only one of the thread (to completion) being used? Can you please suggest on how can I make it more execution(time) efficient? – dheeraj tripathi Mar 14 '18 at 20:12
@dheerajtripathi I'm not sure I understand the question. Your script is just a single process, but it can have multiple threads running concurrently within that process. The point of the executor is to run 8 threads at a time for waiting on child processes, instead of 1 at a time. You still have to wait on all 200 children to finish, but if you're waiting for 8 of them at a time instead of 1 at a time, it should take 1/8th as long. (Of course parallel scaling is never _perfect_, especially since the children are presumably doing 15 seconds worth of CPU time or disk access or whatever…) – abarnert Mar 14 '18 at 20:21

score 0 · Accepted Answer · answered Mar 14 '18 at 19:55

If you don't actually want to interact with lots of processes in parallel, but instead want to interact with each process briefly, then just ignore it while it runs and move onto interacting with the next one…

# Do everything up to the final `interact`. After that, the child
# won't be writing to us anymore, but it will still be running for
# many seconds. So, return the child object so we can deal with it
# later, after we've started up all the other children.
def start_command(path, arg):
    try:
        child = pexpect.spawn(('some command --path {0}  somethingmore --args {1}').format(path, arg), timeout=300)
        child.logfile = open(file_name,'w')
        child.expect('x*')
        child.sendline(something)
        child.expect('E*')
        child.sendline(something))
        # child.read()
        child.interact()
        return child
    except Exception as e:
        print "Exception in child process"
        print str(e)

# First, start up all the children and do the initial interaction
# with each one.
children = []
for path, args in some_iterable:
    children.append(start_command(path, args))

# Now we just need to wait until they're all done. This will get
# them in as-launched order, rather than as-completed, but that
# seems like it should be fine for your use case.
for child in children:
    try:
        child.wait()
        print child.status
    except Exception as e:
        print "Exception in child process"
        print str(e)

A few things:

Notice from the code comments that I'm assuming the child isn't writing anything to us (and waiting for us to read it) after the initial interaction. If that's not true, things are a bit more complicated.

If you want to not only do this, but also spin up 8 children at a time, or even all of them at once, you can (as shown in my other answer) use an executor or just a mess of threads for the initial start_command calls, and have those tasks/threads return the child object to be waited on later. For example, with the Executor version, each future's result() will be a pexpect child process. However, you definitely need to read the pexpect docs on threads in that case—with some versions of linux, passing child-process objects between threads can break the objects.

Finally, since you will now be seeing things much more out-of-order than the original version, you might want to change your print statements to show which child you're printing for (which also probably means changing children from a list of children to a list of (child, path, arg) tuples or the like).

Running pexpect subprocesses in background

2 Answers2