1

I'd like to produce a readable HTML report of a process run. Towards that end, I'd like to track both stdout and stderr, and output them interleaved, yet distinguished - for example, the log will show them both combined according to the order they were emitted, but with stdout in black and stderr in bold red.

I can readily see a solution that will keep them distinguished: just redirect each to subprocess.PIPE. Of course, then they can't be recombined in order. It's also easy to unify them in order: just redirect stderr to subprocess.STDOUT. However, then they will be indistinguishable.

Thus getting the outputs to be either distinguished or combined in order is straightforward, but getting both is not.

What's the way to do that in Python?

Dun Peal
  • 16,679
  • 11
  • 33
  • 46
  • I would just wrap stdout and stderr in different `` classes as I produce the output – NullUserException Sep 30 '19 at 16:21
  • @NullUserException sure, but how do you keep them in order, yet distinguished? I can either redirect both to `subprocess.PIPE`, in which case they'd be distinguished but separate, or I can redirect `stderr` to `stdout`, in which case they will be unified but not distinguished. – Dun Peal Sep 30 '19 at 16:35
  • You *can't* retain perfect order while distinguishing them. UNIX doesn't provide the guarantees necessary to make it possible. It's possible to get close, sure, but to have a guarantee that what you have is perfect, you need to use a syscall-monitoring mechanism to reconstruct the writes. – Charles Duffy Sep 30 '19 at 16:50
  • It's tagged bash rather than python, but everything discussed in [Separately redirecting and recombining stdout/stderr without losing ordering](https://stackoverflow.com/questions/45760692/separately-redirecting-and-recombining-stderr-stdout-without-losing-ordering) applies. – Charles Duffy Sep 30 '19 at 16:52
  • @CharlesDuffy fair point, but how do I get as close as possible? I know this can be done since if I redirect `stderr` to `stdout` on the shell, I get the unified output in "good enough" order. – Dun Peal Sep 30 '19 at 16:53
  • 1
    You actually get *perfect* order when you do that on the shell, just the same as you do in Python when you use `stderr=subprocess.STDOUT`, but that doesn't "keep them distinguished". – Charles Duffy Sep 30 '19 at 16:54
  • 1
    ...anyhow, if you want to try to get something as good as you can get -- have a separate thread reading each FD and handling its content as it's received, with the program doing the writes configured in unbuffered or line-buffered mode (how to do that is tool-specific, though on GNU platforms there's `stdbuf`, which will work if the program is sticking with glibc-provided defaults for its output buffering). – Charles Duffy Sep 30 '19 at 16:56
  • 1
    ...running `2>&1` on the shell (or the Python high-level equivalent) duplicates the file descriptors, making FD 2 *point to the same kernelspace object* as FD 1, so the writes are well-ordered, but also impossible to distinguish between. – Charles Duffy Sep 30 '19 at 16:57

1 Answers1

2

You can use select() to multiplex the output. Suppose you have stdout and stderr being captured in pipes, this code will work:

import select
import sys

inputs = set([pipe_stdout, pipe_stderr])

while inputs:
  readable, _, _ = select.select(inputs, [], [])
  for x in readable:
    line = x.readline()
    if len(line) == 0:
      inputs.discard(x)
    if x == pipe_stdout
      print 'STDOUT', line
    if x == pipe_stderr
      print 'STDERR', line
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
  • This is true, but it doesn't guarantee unmodified/original ordering. Try having a program that interleaves individual writes to stdout and stderr -- if it does those writes fast enough, you'll have clumps (`stdout`/`stdout`/`stderr`/`stderr`) rather than retaining the interleaved `stdout`/`stderr`/`stdout`/`stderr`. – Charles Duffy Sep 30 '19 at 17:04
  • Without the source program flushing, or modifying the source program to do the tagging, I think this is as good as one can get. – Mark Harrison Sep 30 '19 at 17:09
  • I agree; point I'm making is that the distinction (that "good as one can get" is not "perfect") should be made clear. – Charles Duffy Sep 30 '19 at 17:11
  • @CharlesDuffy I think that's the best we can do, though. Any lack of order you'll get with Mark's solution is the same you'd get by redirecting `stderr` to `stdout` at the shell level as you proposed. So it seems to be the best possible solution. – Dun Peal Sep 30 '19 at 17:12
  • 1
    @DunPeal, ...no, "same you'd get [...] at the shell level" is not true; redirecting stderr to stdout at the shell level gives you stronger guarantees (just as passing `stdout=subprocess.STDERR` to `subprocess.Popen` does), because when you do that the two FDs refer to the same underlying object, so there *does* exist a well-defined serialization between writes to them. – Charles Duffy Sep 30 '19 at 17:12
  • @DunPeal, ...that said, "best possible solution" *is* true, unless you're going to go as far as OS-level hackery to track the individual syscalls and reconstruct order from that. Which I've done, as a proof-of-concept on the above-linked bash question, but that's not actually a practice I'd advise people to deploy in production. :) – Charles Duffy Sep 30 '19 at 17:17