0

I want to write a script that would work a bit like hadoop streaming: I provide a random "client" program's path, and from my host python script I "pipe" strings to the client, and I want to receive the client's stdout in my python program.

So for instance if I have the following python basic python client "client.py":

import sys

for line in sys.stdin:
    print("printing : " + line)

I want, from my python host, to be able to call the executable "python client.py", provide it with the list ["a", "b"], and then receive ["printing a", "printing b"] as a result.

Here's what I tried in my host code:

import subprocess    
proc = subprocess.Popen("python client.py",stdout=subprocess.PIPE, stdin=subprocess.PIPE)

for text in ["a", "b"]
    print(text)
    proc.stdin.write(bytes(text, 'UTF-8'))
    result = proc.stdout.read()
    print ("result " + str(result))
    self.proc.wait()

However (on windows) it executes print(text), then open a python.exe windows that remains frozen.... Does anyone know how to accomplish what I'm trying to do ? should work on windows and linux ideally

edit: in my real application the amount of data to transfer to stdin is 10000s of lines of ~1K chars each, so I can't just send it all at once the content from stdout should be around 10000s of lines of 10 chars each

Chris Martin
  • 30,334
  • 10
  • 78
  • 137
lezebulon
  • 7,607
  • 11
  • 42
  • 73
  • To be clear, what version of Python are you targetting? There are good, threadless ways to accomplish this without deadlocking using stuff like the [`selectors` module](https://docs.python.org/3/library/selectors.html), but that requires 3.4 or higher. Pre-3.4, you'd either use the lower level primitives from the [`select` module](https://docs.python.org/3/library/select.html#module-select), or "cheat" and use two threads, one to feed the subprocess, one two consume its output. – ShadowRanger Mar 09 '16 at 03:09
  • Also, side-note: When you're completely done feeding input to the child process, you _must_ call `close` on the `stdin` of the child in the parent. Otherwise the final writes are not flushed, and even if they are flushed, the child has no way to know that there is no more `stdin` to read; it will block forever waiting on it. – ShadowRanger Mar 09 '16 at 03:11

2 Answers2

2

The problem is that read() tries to read the entire stream, which means it waits until the subprocess terminates. You need to determine a way to know when a character is available. Here are some ways to do it:

  1. Read one character at a time until the return character (end-of-line) is encoutered.
  2. The sub application can send constant length outputs. You can specify the length of characters in the read method.
  3. The sub application can announce how many characters it will print.

You also need a condition to tell the subprocess to end. For example, when it receives a special string.

Another problem can come from buffering: data may not be transmitted immediately after a write operation. In this case, you can use flush() to guarantee delivery.

I know your code above is in python3, but to avoid the problems of unicode conversions, the following programs are in python2. You should have no problems converting them to python3.

Program client.py

# pyhton2                             
import sys
do_run = True
while do_run:
  i = ''
  line = ''
  while i != '\n':   # read one char at a time until RETURN
    i = sys.stdin.read(1)
    line += i
  #                                   
  if line.startswith("END"):
    do_run = False
  else:
    sys.stdout.write("printing : " + line)  # RET already in line
    sys.stdout.flush()

Program main.py

from subprocess import Popen, PIPE

proc = Popen(["python2","client.py"], stdout=PIPE, stdin=PIPE, stderr=PIPE )

for text in ('A', 'B', 'C', 'D', 'E'):
  print text
  proc.stdin.write(text+"\n")
  proc.stdin.flush()
  i = ''
  result_list=[]
  while i != '\n':
    i = proc.stdout.read(1)
    result_list.append(i)
  print ("result " + "".join(result_list))

proc.stdin.write("END\n")

I ran the following programs on a Raspberry Pi (Rasbian) and it worked. However, if I commented the lines with flush(), the program jammed.

These program use the first option (read one char at a time), which is probably the slowest. You can improve speed by using the other two, at the expense of a more complicated code.

Sci Prog
  • 2,651
  • 1
  • 10
  • 18
1

For interacting with child processes (for instance, to read 'prompts' and react to them) pexpect is the way to go:

https://pexpect.readthedocs.org/en/stable/

However, if you don't care about interacting "intelligently" and just want to send a bunch of lines and echo them...

in client.py:

from sys import stdin

for line in stdin:
    print(line,end="")

and in your host file:

from subprocess import Popen, PIPE

text = b"a\nb\n"

sub = Popen(["python3","client.py"],stdout=PIPE,stdin=PIPE).communicate(text)

print(sub[0].decode())

In light of your Edit, see new hostfile below:

import os
from pty import fork
from time import sleep

inputs = [b"a",b"b"]

parent, fd = fork()

if not parent:
    os.execv("/usr/bin/python3",["usr/bin/python3","/path/to/file/client.py"])

for each in inputs:
    os.write(fd,each+b'\n')
    sleep(0.5)
    os.read(fd,len(each)) #We have to get rid of the echo of our write
    print(os.read(fd,200).decode().strip())

there are also issues with using the sys.stdin method used with Popen in the client, because the input is not there when the client launches, so we need to make it block. A (very simple) exmaple:

i = input()
print("printing {0}".format(i))
i = input()
print("printint {0}".format(i))

This will not work on Windows (unless someone's implement forking there and I'm not aware). I'm not sure how to do it in windows, as I spend no time there.

There are significant limitations, here. Its synchronous, for one, and os.read() is not exactly high level.

Keozon
  • 998
  • 10
  • 25
  • thanks ! but cf my edit: I intend to send tons of data to the input stream so it has to be a real stream, same for stdout – lezebulon Mar 08 '16 at 23:51
  • Yeah, posted too slow. Updated. hth – Keozon Mar 09 '16 at 00:17
  • I had some errors in my first update -- I had not tested anything. Now it works :) – Keozon Mar 09 '16 at 00:41
  • The `.readlines()` bit is thoroughly pointless; iterating a file-like object like `stdin` already gets the lines one at a time, adding `.readlines()` forces _all_ lines to be read into a `list` before the first line is iterated, lowering responsiveness and increasing peak memory usage. – ShadowRanger Mar 09 '16 at 03:06
  • @ShadowRanger thank you, I did not know that. Not sure how I never ran into that, but I guess now I have. I'll update my post to fix that. – Keozon Mar 09 '16 at 03:51