16

I need to write two programs which will be run as a parent process and its child. The parent process spawns the child and then they communicate via pair of pipes connected to child's stdin and stdout. The communication is peer-to-peer, that's why I need asyncio. A simple read/reply loop won't do.

I have written the parent. No problem because asyncio provides everything I needed in create_subprocess_exec().

However I don't know how to create a similar stream reader/writer in the child. I did not expect any problems. because the pipes are already created and file descriptors 0 and 1 are ready to use when the child process starts. No connection is to be open, no process needs to be spawned.

This is my not working attempt:

import asyncio
import sys

_DEFAULT_LIMIT = 64 * 1024

async def connect_stdin_stdout(limit=_DEFAULT_LIMIT, loop=None):
    if loop is None:
        loop = asyncio.get_event_loop()
    reader = asyncio.StreamReader(limit=limit, loop=loop)
    protocol = asyncio.StreamReaderProtocol(reader, loop=loop)
    r_transport, _ = await loop.connect_read_pipe(lambda: protocol, sys.stdin)
    w_transport, _ = await loop.connect_write_pipe(lambda: protocol, sys.stdout)
    writer = asyncio.StreamWriter(w_transport, protocol, reader, loop)
    return reader, writer

The problem is I have two transports where I should have one. The function fails, because it tries to set the protocol's transport twice:

await loop.connect_read_pipe(lambda: protocol, sys.stdin)
await loop.connect_write_pipe(lambda: protocol, sys.stdout)
# !!!! assert self._transport is None, 'Transport already set'

I tried to pass a dummy protocol to the first line, but this line is not correct either, because both transports are needed, not just one:

writer = asyncio.StreamWriter(w_transport, protocol, reader, loop)

I guess I need to combine two unidirectional transports to one bidirectional somehow. Or is my approach entirely wrong? Could you please give me some advice?


UPDATE: after some test this seems to work (but does not look good to me):

async def connect_stdin_stdout(limit=_DEFAULT_LIMIT, loop=None):
    if loop is None:
        loop = asyncio.get_event_loop()
    reader = asyncio.StreamReader(limit=limit, loop=loop)
    protocol = asyncio.StreamReaderProtocol(reader, loop=loop)
    dummy = asyncio.Protocol()
    await loop.connect_read_pipe(lambda: protocol, sys.stdin) # sets read_transport
    w_transport, _ = await loop.connect_write_pipe(lambda: dummy, sys.stdout)
    writer = asyncio.StreamWriter(w_transport, protocol, reader, loop)
return reader, writer
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
VPfB
  • 14,927
  • 6
  • 41
  • 75
  • 1
    https://github.com/python/asyncio/issues/213 and https://gist.github.com/nathan-hoad/8966377 are strongly related. – Martijn Pieters Sep 07 '18 at 10:10
  • You can see my question and answer to similar question at https://stackoverflow.com/questions/52495824 – yoonghm Oct 07 '18 at 14:53
  • Can you elaborate why you don't think your own attempt 'does not look good'? – Martijn Pieters Oct 08 '18 at 12:56
  • 1
    @yoonghm: that's something quite different, you are not using asyncio there. – Martijn Pieters Oct 08 '18 at 12:57
  • 1
    @MartijnPieters "does not look good" simply means that I think I am using the asyncio stream functions and classes not the way they were designed or documented. – VPfB Oct 08 '18 at 14:33
  • @MartijnPieters, ya my solution is based on multiprocessing instead of coroutine and threads. My proposal was to use multiprocessing, passing date using shared memory, notification and synchronization using pipe. I could not find a usable unix-equivalent signal in Windows platform. The IPC pattern could be scaled up. – yoonghm Oct 08 '18 at 15:27
  • @yoonghm Let me add some background information. I'm rewriting existing software originaly using a different asyncio-style library to standard asyncio which I started to like and to prefer since Python 3.7. I want to make the changes as little as possible. – VPfB Oct 08 '18 at 16:19
  • This looks very promising (Python 3.8 is scheduled for October 2019): https://bugs.python.org/issue36889 – VPfB May 28 '19 at 11:25
  • New Python 3.8 Streams were pulled out in the very last moment: https://bugs.python.org/issue38242 – VPfB Oct 23 '19 at 13:02
  • Updated link to the Nathan Hoad gist: https://gist.github.com/nhoad/8966377 – mikepurvis Sep 01 '20 at 11:07

1 Answers1

8

Your first version fails because you are using the wrong protocol for the writer side; the StreamReaderProtocol implements hooks to react to incoming connections and data, something the writing side doesn't and shouldn't have to deal with.

The loop.connect_write_pipe() coroutine uses the protocol factory you pass in and returns the resulting protocol instance. You do want to use that same protocol object in the stream writer, instead of the protocol used for the reader.

Next, you do not want to pass the stdin reader to the stdout stream writer! That class assumes that the reader and writer are connected to the same file descriptor, and that's really not the case here.

In the recent past I've build the following to handle stdio for a child process; the stdio() function is based on the Nathan Hoad gist on the subject, plus a fallback for Windows where support for treating stdio as pipes is limited.

You do want the writer to handle backpressure properly, so my version uses the (undocumented) asyncio.streams.FlowControlMixin class as the protocol for this; you really don't need anything more than that:

import asyncio
import os
import sys

async def stdio(limit=asyncio.streams._DEFAULT_LIMIT, loop=None):
    if loop is None:
        loop = asyncio.get_event_loop()

    if sys.platform == 'win32':
        return _win32_stdio(loop)

    reader = asyncio.StreamReader(limit=limit, loop=loop)
    await loop.connect_read_pipe(
        lambda: asyncio.StreamReaderProtocol(reader, loop=loop), sys.stdin)

    writer_transport, writer_protocol = await loop.connect_write_pipe(
        lambda: asyncio.streams.FlowControlMixin(loop=loop),
        os.fdopen(sys.stdout.fileno(), 'wb'))
    writer = asyncio.streams.StreamWriter(
        writer_transport, writer_protocol, None, loop)

    return reader, writer

def _win32_stdio(loop):
    # no support for asyncio stdio yet on Windows, see https://bugs.python.org/issue26832
    # use an executor to read from stdio and write to stdout
    # note: if nothing ever drains the writer explicitly, no flushing ever takes place!
    class Win32StdinReader:
        def __init__(self):
            self.stdin = sys.stdin.buffer 
        async def readline():
            # a single call to sys.stdin.readline() is thread-safe
            return await loop.run_in_executor(None, self.stdin.readline)

    class Win32StdoutWriter:
        def __init__(self):
            self.buffer = []
            self.stdout = sys.stdout.buffer
        def write(self, data):
            self.buffer.append(data)
        async def drain(self):
            data, self.buffer = self.buffer, []
            # a single call to sys.stdout.writelines() is thread-safe
            return await loop.run_in_executor(None, sys.stdout.writelines, data)

    return Win32StdinReader(), Win32StdoutWriter()

While perhaps outdated a little, I found this 2016 blog post by by Nathaniel J. Smith on asyncio and curio to be hugely helpful in understanding how asyncio, protocols, transports and backpressure and such all interact and hang together. That article also shows why creating the reader and writer objects for stdio is so verbose and cumbersome at the moment.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I posted the question, but I cannot judge the accuracy of this answer. @user4815162342 who offered the bounty, is much more experienced to do that. – VPfB Oct 08 '18 at 14:38
  • @VPfB: sure, that's fine. I've added a link to a blog post that helped me make huge strides in understanding the concepts involved here. – Martijn Pieters Oct 08 '18 at 15:06
  • I don't think the `run_in_executor` calls are necessary if you know that the handles actually refer to pipes - for example, `asyncio.subprocess` manages to support win32 without them. Also, the win32 code as implemented here will only support `readline()` on the stream, but not all other operations, such as `read(n)`, `read()`, `readexactly(n)`, etc. – user4815162342 Oct 08 '18 at 17:58
  • 1
    The crux of the question is: `asyncio.subprocess` provides a nice async API to spawn a subprocess and talk to it. But if we also want to use asyncio **in** the subprocess, there is no obvious way to talk to the parent. Supporting win32 by talking to to `run_in_executor` will of course work, but it requires duplicating the whole stream API, and it ultimately feels unnecessary, given that `asyncio.subprocess` manages to do without it. – user4815162342 Oct 08 '18 at 17:59
  • @user4815162342: the subprocess pipes are opened using the windows APIs to set the required OVERLAPPED flags, these flags are *not* set on stdin / stdout and you can't do so from Python code. See https://bugs.python.org/issue26832 for more info. – Martijn Pieters Oct 08 '18 at 19:02
  • @user4815162342: I agree that it is frustrating that there is such a disconnect between pipes to a child process and standard I/O on Win32. I haven't had the time or resources to go poke the Windows API with a sharp `ctypes` stick to get the descriptors re-opened with the requisite flags set, and to be frank, this is something asyncio library should take care of for us. – Martijn Pieters Oct 08 '18 at 19:07
  • @user4815162342: the approach in my answer here *works today*. We can add the additional `read()` calls as needed. – Martijn Pieters Oct 08 '18 at 19:08
  • It works for `readline()`, but I'm not sure it can replicate subtler aspects of `StreamReader`. For example, a `StreamReader` provides both `read(n)` (to read some available data, but no more than *n* bytes) and `readexactly(n)` (to read exactly *n* bytes). The stdio layer doesn't provide a "read some bytes" operation, it will always try to either read all the bytes or reach EOF. – user4815162342 Oct 08 '18 at 19:32
  • @user4815162342: no, [`file.read()`](https://docs.python.org/3/library/io.html#io.BufferedIOBase.read) works exactly like `StreamReader.read()`; you pass in `n` and get up to `n` bytes back. `readexactly(n)` is just a loop and a buffer, calling `read()` with the difference between data buffered so far and what was requested. – Martijn Pieters Oct 08 '18 at 20:27
  • `file.read` blocks for as long as it takes to either read *n* bytes or reach EOF. `StreamReader.read` will happily return some data as soon as it has them. This is demonstrated in [this code](https://pastebin.com/nk3KAXWA) - if you use `nc -lp8080` to listen on a socket, you can type a line and it will show up immediately on asyncio side, without waiting for the full 1024 bytes to appear. Your imitation of `read()` won't work like that - if some data appears on the pipe, it won't return it, but will keep the coroutine suspended until the whole request is satisfied. – user4815162342 Oct 09 '18 at 05:11
  • That kind of thing is what I meant by subtler aspect of stream reader's behavior - while it is easy to get something that looks like and appears to work like a stream reader, it is actually not trivial to correctly and fully emulate it. – user4815162342 Oct 09 '18 at 05:14
  • @user4815162342: yes, I'm aware that the asyncio streams will not block; there is a reason the fallback focuses on readline only, as that's the simpler case here and sufficed in the use cases I dealt with so far. With additional work it *can* be made to work reasonably well in a thread executor to approximate the other methods. – Martijn Pieters Oct 09 '18 at 17:17
  • @user4815162342: however, what *really* should be done is porting the work done for twisted to Asyncio: https://twistedmatrix.com/trac/ticket/2157. It is probably possible to build on top of the [proactor event loop](https://docs.python.org/3/library/asyncio-eventloops.html#asyncio.ProactorEventLoop) together with either the pywin32 `win32pipes` / `win32console` modules or ctypes bindings of the same API [such as the one Enthought built around other such APIs](https://github.com/enthought/pywin32-ctypes), to make this work properly. That's the Python issue i pointed to already. – Martijn Pieters Oct 09 '18 at 17:32
  • @MartijnPieters I was responding to your statement that your code works _today_, which is unfortunately not the case. It doesn't provide a `read` and, as shown above, it's not trivial to implement it to match asyncio semantics. – user4815162342 Oct 09 '18 at 20:20
  • @user4815162342: right, for the usecases I needed, the code works today. I understand that if you need further implementations, then the win32 fallback won't work, without additional work. I don't think we need to go over this again though, do we? – Martijn Pieters Oct 09 '18 at 20:29
  • @MartijnPieters Not "again", because you are now adding a new qualification about particular use cases. Last time we discussed this - which was today - you claimed that `file.read` was perfectly adequate for implementing StreamReader-like `read`, which is not the case. Sorry, but your answer does not address the question and is certainly not worthy of the bounty. – user4815162342 Oct 09 '18 at 21:14
  • @user4815162342: I stand by that remark, because with enough work, you can make a shim that will work well enough for the use-case in the question. It's just not a non-trivial amount of work, and I'd rather see the time go to implementing win32 console support in asyncio proper. I do not have that time myself. I'm sorry that you feel that the answer is not bounty worthy; you didn't make any assertions about what kind of answer you were looking for, there was no description with the bounty nor a comment on the question. – Martijn Pieters Oct 09 '18 at 21:19
  • @user4815162342: at any rate, to the specific question of how to make asyncio work well with **Windows console streams**, you are not likely to get a better answer at this point in time, nor is this specific question that I answered asking for that. Perhaps you want to put a bounty on [aysncio cannot read stdin on Windows](//stackoverflow.com/q/31510190) instead, which *does* ask for exactly that specific problem to be solved. – Martijn Pieters Oct 09 '18 at 21:23
  • @MartijnPieters Again, you didn't even show how the shim could work, nor did you respond to my showing that it *won't* work with the combination of blocking streams and `run_in_executor`. The answer is not worthy of bounty because it doesn't answer the question, which is clearly about getting a stream reader (or compatible) API which is not provided here, and not due to some additional constraints of the bounty, as you are now implying. – user4815162342 Oct 09 '18 at 21:30
  • @user4815162342: I stated several times that the win32 support for streams with stdio is severely lacking, and linked to the relevant Python bug tracker issue for this. We then discussed some of the limitations of the shim, and some possible extensions. That could have been it, but I have the feeling that you are more about making demands here than having a discussion about the possibilities or options. As such, I'm just going to leave this conversation. – Martijn Pieters Oct 09 '18 at 21:34
  • @MartijnPieters You keep stating that the shim could be improved, but fail to specify how that could be done (other than by implementing full support for win32 streams in asyncio). I am genuinely curious how you propose to extend the shim to implement `read()` with the stream reader semantics on top of `file.read` and `run_in_executor`. Or, are you claiming that that is not necessary? If so, why not? `read()` is probably the most basic operation provided by `StreamReader`, and omitting it, or not getting its semantics right, makes the code incomplete at best. – user4815162342 Oct 09 '18 at 21:45
  • MartijnPieters, @user4815162342 Gentlemen, could you agree on the Linux/Unix part disregarding win32 architecture for now? – VPfB Oct 10 '18 at 15:40
  • @VPfB: I stand by the non-Win32 implementation; that's solid. The win32 shim is sufficient if the parent sends data that is newline terminated. For other uses you'd have to expand the shim to meet the use-cases. As the conversation with user4815162342 highlights, implementation parity is possible but non-trivial, and I recommend anyone that wants to attempt that to try to fix the asyncio standard library to handle this for you, instead. – Martijn Pieters Oct 12 '18 at 08:00
  • @user4815162342: let me try to rephrase this then: yes, I think it is possible to build a win32 shim that will be close to the POSIX behaviour. However, that will be a lot of work, time that would be better spent in making the asyncio library itself take care of the underlying issue natively (e.g. by poking the win32 console and pipe APIs from C). Until then, it is possible to produce shims that meet *specific usecases*; mine here aims to work for line-based communication with a parent process, for which using a thread executor is sufficient. I'm not going to spend more time on it. – Martijn Pieters Oct 12 '18 at 08:03
  • @VPfB The issue with the Unix part is that it directly instantiates a mixin, which is normally not done. We don't know if that is correct or if that code will keep working because the mixin is undocumented. Also, the code accesses the private name `asyncio.streams._DEFAULT_LIMIT` which is certainly not correct. But since the code is based on an existing (and presumably field-tested) gist, it might work well despite the issues. I would consider using it in production, but would also file a bugs.python.org issue to document `FlowControlMixin` or provide an alternative for this purpose. – user4815162342 Oct 15 '18 at 12:51
  • @user4815162342: it's another unfortunate issue with asyncio: the API is still unorganized, cluttered and in places, incomplete. The [`FlowControlMixin` class documentation](https://github.com/python/cpython/blob/1bf9cc509326bc42cd8cb1650eb9bf64550d817e/Lib/asyncio/streams.py#L147-L155) shows how useful it is, and it is a direct subclass of `Protocol`. That it is undocumented is a side effect of the assumption that reader and writer are always used in tandem (two sides of the same file descriptor), but stdin / stdout / stderr is not such a use case yet would have back pressure issues without. – Martijn Pieters Oct 15 '18 at 14:11
  • @user4815162342: the API saw several big improvements in Python 3.7, more will follow. I'm sure `FlowControlMixin` will either become available as a documented part (perhaps under an alias), or a better alternative for handling the stdio streams will become available. – Martijn Pieters Oct 15 '18 at 14:13
  • The issue was brought to attention of asyncio developers: https://bugs.python.org/issue34993 – VPfB Oct 15 '18 at 19:12