Poor named pipe performance when redirecting stdout of child processes

Question

This is almost the same question I asked here question but in that question I wanted to know is Ignoring stdout of child process, dangerous?

This question is part of the reason why I asked that question.

My Application

I have an application which starts 1-45 processes (job dependent) and communicates with them via named pipes. I'm only interested in the stderr of the child processes and have no interest in redirecting the stdout of the processes.

I have written code that does exactly this, I redirect stderr and not stdout and everything works as expected. And when I say that I mean, I get the redirected stderr and I don't see any stdout appearing on the parent processes console nor do I get any deadlocks due to not redirecting stdout.

Concern

I was concerned (as outlined in the previous question) that not redirecting the output of chatty child processes could fill up an internal buffer causing the child processes to block while attempting to write to stdout.

Problem

In order to circumvent this issue I decided to redirect the child processes stdout and just dump (do nothing) the output.

When running with a few processes this is fine but as soon as I increase the number of spawned processes, writing a message over the named pipes I use to communicate with the child processes (in both directions) increases dramatically. When not redirecting stdout the write times are usually < 1ms but after redirecting stdout the times jump to 2-4 seconds...

I decided to write a Minimal, Reproducible Example and see if I can reproduce the issue (which I could). While using this example I realized I could recreate it with only writing a minimal amount of text to stdout from the child processes.

I should also note that CPU utilization never surpasses 20% during this example.

ProcessExtensions

This is the well known Process extension for asynchronously awaiting for the child process to close.

public static class ProcessExtensions
{
    public static async Task WaitForExitAsync( this Process process,
        CancellationToken token = default )

    {
        var tcs = new TaskCompletionSource<bool>( );

        process.EnableRaisingEvents = true;
        process.Exited += handler;

        try
        {
            if ( process.HasExited ) return;
            using ( token.Register( ( ) => tcs.TrySetCanceled( ) ) )
                await tcs.Task.ConfigureAwait( false );
        }
        finally
        {
            process.Exited -= handler;
        }

        void handler( object? sender, EventArgs args ) =>
            tcs?.TrySetResult( true );
    }
}

ChildProcess

I use the ChildProcess class to wrap the Process class.

public class ChildProcess
{
    public async Task StartProcessAsync( string executablePath, 
        int id, CancellationToken token = default )
    {
        using var process = new Process( );

        try
        {
            process.StartInfo.FileName = executablePath;                
            process.StartInfo.UseShellExecute = false;
            process.StartInfo.ErrorDialog = false;
            process.StartInfo.CreateNoWindow = true;
            process.StartInfo.RedirectStandardError = true;
            process.StartInfo.RedirectStandardOutput = true;

            await StartProcess( process, StdErrorOutput, token ).ConfigureAwait( false );
            Console.WriteLine( $"Child process {id} exited with code: {process.ExitCode}" );
        }
        finally
        {
            if ( !process.HasExited )
                process.Kill( );
        }

        void StdErrorOutput( object sender, DataReceivedEventArgs args )
        {
            if ( string.IsNullOrWhiteSpace( args.Data ) ) return;
            Console.WriteLine( $"Std Error: {args.Data}" );
        }
    }

    private Task StartProcess( Process process,
        DataReceivedEventHandler? stderr, CancellationToken token = default )
    {
        return Task.Run( async ( ) =>
        {
            try
            {
                if ( stderr is object )
                    process.ErrorDataReceived += stderr;

                if ( !process.Start( ) )
                    throw new ApplicationException( $"Failed to start '{process.StartInfo.FileName}'" );

                if ( stderr is object )
                    process.BeginErrorReadLine( );

                var task = DumpOutput( );
                await process.WaitForExitAsync( token ).ConfigureAwait( false );
                await task.ConfigureAwait( false );
            }
            finally
            {
                SafeKill( );

                if ( stderr is object )
                    process.ErrorDataReceived -= stderr;
            }

            Task DumpOutput( )
            {
                return Task.Run( async ( ) => 
                {
                    while ( !process.HasExited )
                        await process.StandardOutput.ReadLineAsync( );
                } );
            }

            void SafeKill( )
            {
                try
                {
                    if ( !process.HasExited )
                    {
                        process.Refresh( );

                        if ( stderr is object )
                            process.CancelErrorRead( );

                        process.Kill( );
                    }
                }
                catch ( Exception ex )
                {
                    Console.WriteLine( ex.Message );
                }
            }

        } );
    }
}

Here you can see the DumpOutput function. This just reads stdout and does nothing with it. This is just 1 iteration of what I've tried.

I have also tried asynchronous reading using BeginOutputReadLine and providing the associated event handler. I have tried completely independent loops (non awaited) running in a background thread. I've tried using the different methods for reading from a Stream ie ReadToEndAsync. Of these I've tried minor variations (note the Task.Run in the DumpOutput function) to see if I could get different results.

But the results are always the same, if I redirect stdout my communication over named pipes slows down dramatically. For at least the first batch (this depends on the number of started clients) of messages.

This is killing the throughput of my application for smaller jobs.

Server

This is the reproduced server example I created. I'm only going to show main here as if I show everything the example will be far too much code.

static async Task Main( string[ ] args )
{
    var clients = new List<PipeConnection>( PipeInstances );
    var server = new PipeServer( "TestPipe", PipeInstances );

    var connectionsTask = HandleConnectionsAsync( );

    var processes = new List<ChildProcess>( PipeInstances );
    var processTasks = new List<Task>( PipeInstances );

    for ( var i = 0; i < PipeInstances; i++ )
    {
        var child = new ChildProcess( );
        processes.Add( child );
        processTasks.Add( child.StartProcessAsync( ManagedClient, i ) );
    }

    await connectionsTask.ConfigureAwait( false );

    var timeout = DateTime.Now.AddSeconds( 10 );
    while ( DateTime.Now < timeout  )
    {
        foreach ( var client in clients )
        {
            var sw = Stopwatch.StartNew( );

            await client.WriteMessageAsync( "Hello" ).ConfigureAwait( false );

            sw.Stop( );
            Console.WriteLine( $"Time to send message: {sw.ElapsedMilliseconds} ms" );

            await client.ReadAsync( ).ConfigureAwait( false );
        }
        await Task.Delay( 1000 ).ConfigureAwait( false );
    }

    await BroadcastMessageAsync( "Cancel" ).ConfigureAwait( false );
    await Task.WhenAll( processTasks ).ConfigureAwait( false );

    DisposeClients( );
}

PipeConnection

Here I'm only going to show the WriteMessageAsync method to try and keep the amount of code to a minimum.

public async Task<int> WriteMessageAsync( string message, CancellationToken token = default )
{
    var buffer = Encoding.ASCII.GetBytes( message );
    var msgLength = BitConverter.GetBytes( buffer.Length );

    await _pipe.WriteAsync( msgLength, 0, msgLength.Length, token ).ConfigureAwait( false );
    await _pipe.WriteAsync( buffer, 0, buffer.Length, token ).ConfigureAwait( false );
    await _pipe.FlushAsync( token ).ConfigureAwait( false );

    return buffer.Length + 4;
}

I wont be showing the client code as I don't believe it's relevant. I have two flavors of client, one written in C# and one written in C++. In either case the results are the same.

Question

Why does redirecting stdout effect the performance of writing over named pipes

I would guess your problem is not named pipes or stdout - but that you have up to **45 child processes** | Child processes are a really primitive way of implementing multitasking, multithreading in particular. The one with the highest isolation, but worst performance. If you could turn some of those child processes into threads in the main application, that should give you a relevant speedup. | Another person might have a more efficient IPC approach, but that would only mean the problem re-appears at 90 or 200 child processes. — Christopher, May 21 '20 at 11:49
Exception handling is a pet-peeve of mine, and yours need work. Catching exception is a big mistakes in itself. Only logging the message wastes 95% of the Exceptions debugging informaton. But at least you got no async voids. | I have two articles on the mater I link often: https://blogs.msdn.microsoft.com/ericlippert/2008/09/10/vexing-exceptions/ | https://www.codeproject.com/Articles/9538/Exception-Handling-Best-Practices-in-NET — Christopher, May 21 '20 at 11:52
@Christopher The `processes` are a necessity. They run third party application code which requires a license. You can only use 1 license per process. That being said, if what what were saying were true then I should see the same performance impact regardless of whether I redirect `stdout` or not (which is not the case). Also note this is not production code, this is a minimal example and only used for illustration. If the question was about how to handle errors your input would be welcome but that is not the question I asked — WBuck, May 21 '20 at 11:52
Transfering more data is most definitely not *improoving* performance. Even turning piecemeal transfer into one request, will not incrase the data amount. However with all things perforamnce/speed, the Speed Rant applies: https://ericlippert.com/2012/12/17/performance-rant/ — Christopher, May 21 '20 at 11:54
"I should also note that CPU utilization never surpasses 20% during this example." We need to be precise here: **overall** CPU utilisation or the one of **1 core**? Because if you got 8 cores, one fully taxed out one reads only as 12.5%. If the main process taxes out one core (as you are using only async) and the other use another 7.5% split over the remainign cores, you got a odd CPU bound case. — Christopher, May 21 '20 at 12:01
Overall `CPU` utilization. The workload is fairly evenly spread across all cores, with no unusual spikes. Of the `4` cores in my `dev` PC none of them are pinned. — WBuck, May 21 '20 at 12:06
@Christopher your comment "as you are using only async" made me realize I needed to spread the workload out a bit more (specifically when dealing with `stdout`). When I use a dedicated `Thread` instead of `async` for reading `stdout` the issue is resolved. — WBuck, May 21 '20 at 12:23
async is wondefull and I do not miss the times where we did not have it, as it is Multitasking without needing to resort to Multithreading or writing all that cooperative Multitasking code ourself. | But in your case, on your scale it definitely stopped working. I am happy I could help you | Just warning: Make certain you do not swallow exceptions with Multithreading. It is even easier then it already was with async. I know I sound a bit paranoid about that, but I have seen one programm to many that made this mistake :) — Christopher, May 21 '20 at 12:30