Named pipes efficient asynchronous design

Question

The problem:

To design an efficient and very fast named-pipes client server framework.

Current state:

I already have battle proven production tested framework. It is fast, however it uses one thread per one pipe connection and if there are many clients the number of threads could fast be to high. I already use smart thread pool (task pool in fact) that can scale with need.

I already use OVERLAPED mode for pipes, but then I block with WaitForSingleObject or WaitForMultipleObjects so that is why I need one thread per connection on the server side

Desired solution:

Client is fine as it is, but on the server side I would like to use one thread only per client request and not per connection. So instead of using one thread for the whole lifecycle of client (connect / disconnect) I would use one thread per task. So only when client requests data and no more.

I saw an example on MSDN that uses array of OVERLAPED structures and then uses WaitForMultipleObjects to wait on them all. I find this a bad design. Two problems I see here. First you have to maintain an array that can grow quite large and deletions will be costly. Second, you have a lot of events, one for each array member.

I also saw completion ports, like CreateIoCompletionPort and GetQueuedCompletionStatus, but I don't see how they are any better.

What I would like is something ReadFileEx and WriteFileEx do, they call a callback routine when the operation is completed. This is a true async style of programming. But the problem is that ConnectNamedPipe does not support that and furthermore I saw that the thread needs to be in alertable state and you need to call some of the *Ex functions to have that.

So how is such a problem best solved?

Here is how MSDN does it: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365603(v=vs.85).aspx

The problem I see with this approach is that I can't see how you could have 100 clients connected at once if the limit to WaitForMultipleObjects is 64 handles. Sure I can disconnect the pipe after each request, but the idea is to have a permanent client connection just like in TCP server and to track the client through whole life-cycle with each client having unique ID and client specific data.

The ideal pseudo code should be like this:

repeat
  // wait for the connection or for one client to send data
  Result = ConnectNamedPipe or ReadFile or Disconnect; 

  case Result of
    CONNECTED: CreateNewClient; // we create a new client
    DATA: AssignWorkerThread; // here we process client request in a thread
    DISCONNECT: CleanupAndDeleteClient // release the client object and data
  end;
until Aborted;

This way we have only one listener thread that accepts connect / disconnect / onData events. Thread pool (worker thread) only process the actual request. This way 5 worker threads can serve a lot of clients that are connected.

P.S. My current code should not be important. I code this in Delphi but its pure WinAPI so the language does not matter.

EDIT:

For now IOCP look like the solution:

I/O completion ports provide an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system. When a process creates an I/O completion port, the system creates an associated queue object for requests whose sole purpose is to service these requests. Processes that handle many concurrent asynchronous I/O requests can do so more quickly and efficiently by using I/O completion ports in conjunction with a pre-allocated thread pool than by creating threads at the time they receive an I/O request.

Also be aware that [`WaitForMultipleObject`](http://msdn.microsoft.com/en-us/library/windows/desktop/ms687025) has a limitation of 64 handles (MAXIMUM_WAIT_OBJECTS)! — Jochen Kalmbach, Jul 23 '13 at 09:11
Deletions are not costly. What does that even mean? I vote for WFMO on overlapped structures. I see nothing bad about it. Just need one more event in waiting array that would stop the wait if array needs adjusting, or complete abort is in progress. — Dialecticus, Jul 23 '13 at 09:12
@Dealecticus: Delection on dynamic array are costly. If you delete the item in the middle you have to move all the items following it. This is programming structures 101 :) Ok sure if you have under 100 items it will not show. But I can have more then 100 clients connected. — Runner, Jul 23 '13 at 09:25
Maybe the simple solution is to connect and disconnect to a pipe server for each request. This way all the problems go away because I do not require threads for each connection, but only each request that runs simultaneously. I can maintain the client state internally as if it is connected the whole time. — Runner, Jul 23 '13 at 09:45
Deletions are not costly when working with WFMO because WFMO can handle only 64 handles. If you need more than 64 then use IOCP, or better yet some networking framework that will do that for you. — Dialecticus, Jul 23 '13 at 10:38
@Dialecticus: On this I agree. Will look again ath IOCP. Networking is not an option. I already have a same type of framework based on TCP and it works fine. But I need IPC communication and it need to be as fast as it gets. — Runner, Jul 23 '13 at 10:46
Altough I'm still a horrible beginner, I recently also had to decide whether to implement IPC under windows with pipes, sockets, or something else. I went with sockets. Named pipes didn't look very promising under windows, especially when you were trying to communicate over a network. I have also heard that if you're just communicating over sockets locally, Windows skips wrapping it into TCP packets and Ethernet frames to speed things up. I can, however, not back that up with a source. Overall, I'm very happy with the socket route I took. — Günther the Beautiful, Jul 23 '13 at 10:51
[boost::asio](http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html) supports both [IOCP and named pipes](http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/overview/windows/stream_handle.html). — Dialecticus, Jul 23 '13 at 10:58
@Günther the Beautiful: I have done tests. My IPC can process a full blown request with average ammount of data and send response in around 0.1ms. This is whole client-server-client cycle. TCP can do it in around 1ms if I remember correctly. This is huge difference in my case and also IPC does not consume ports even locally. Behind the screen IPC works with memory mapped files on Windows. — Runner, Jul 23 '13 at 11:03
@Dialecticus: Thanks, but I use Delphi. However I read the IOCP MSDN entries again and I think it is exactly what I need. I missed the fact that single IOCP can hold unlimited (in theory) pipe handles. This means that I can have only one processing thread that then delegates the tasks further when they are available. If you care you can write an answer with more detailed explanation :) — Runner, Jul 23 '13 at 11:05
Who voted for close? 3 votes for close on what is to me a perfectly valid question. 6 upvotes on the question and two favorites confirm that. Stack overflow is getting stranger by the day. — Runner, Jul 24 '13 at 06:03

score 4 · Accepted Answer · answered Jul 23 '13 at 11:26

If server must handle more than 64 events (read/writes) then any solution using WaitForMultipleObjects becomes unfeasible. This is the reason the Microsoft introduced IO completion ports to Windows. It can handle very high number of IO operations using the most appropriate number of threads (usually it's the number of processors/cores).

The problem with IOCP is that it is very difficult to implement right. Hidden issues are spread like mines in the field: [1], [2] (section 3.6). I would recommend using some framework. Little googling suggests something called Indy for Delphi developers. There are maybe others.

At this point I would disregard the requirement for named pipes if that means coding my own IOCP implementation. It's not worth the grief.

Thanks for the answer. I think it is the correct one. I know Indy and I have the same type of framework build upon it as I have for my IPC. I have two implementations of my framework, one is named-pipes based and so IPC and the other is IMC (inter machine communication) and is based on Indy. I will heed your warning, but still try the implementation. If it does not work out then at least I will learn something new. — Runner, Jul 23 '13 at 11:33

score 1 · Answer 2 · answered Jul 24 '13 at 04:46

I think what you're overlooking is that you only need a few listening named pipe instances at any given time. Once a pipe instance has connected, you can spin that instance off and create a new listening instance to replace it.

With MAXIMUM_WAIT_OBJECTS (or fewer) listening named pipe instances, you can have a single thread dedicated to listening using WaitForMultipleObjectsEx. The same thread can also handle the rest of the I/O using ReadFileEx and WriteFileEx and APCs. The worker threads would queue APCs to the I/O thread in order to initiate I/O, and the I/O thread can use the task pool to return the results (as well as letting the worker threads know about new connections).

The I/O thread main function would look something like this:

create_events();
for (index = 0; index < MAXIMUM_WAIT_OBJECTS; index++) new_pipe_instance(i);

for (;;)
{
    if (service_stopping && active_instances == 0) break;

    result = WaitForMultipleObjectsEx(MAXIMUM_WAIT_OBJECTS, connect_events, 
                    FALSE, INFINITE, TRUE);

    if (result == WAIT_IO_COMPLETION) 
    {
        continue;
    }
    else if (result >= WAIT_OBJECT_0 && 
                     result < WAIT_OBJECT_0 + MAXIMUM_WAIT_OBJECTS) 
    {
        index = result - WAIT_OBJECT_0;
        ResetEvent(connect_events[index]);

        if (GetOverlappedResult(
                connect_handles[index], &connect_overlapped[index], 
                &byte_count, FALSE))
            {
                err = ERROR_SUCCESS;
            }
            else
            {
                err = GetLastError();
            }

        connect_pipe_completion(index, err);
        continue;
    }
    else
    {
        fail();
    }
}

The only real complication is that when you call ConnectNamedPipe it may return ERROR_PIPE_CONNECTED to indicate that the call succeeded immediately or an error other than ERROR_IO_PENDING if the call failed immediately. In that case you need to reset the event and then handle the connection:

void new_pipe(ULONG_PTR dwParam)
{
    DWORD index = dwParam;

    connect_handles[index] = CreateNamedPipe(
        pipe_name, 
        PIPE_ACCESS_DUPLEX | FILE_FLAG_OVERLAPPED,
        PIPE_TYPE_MESSAGE | PIPE_WAIT | PIPE_ACCEPT_REMOTE_CLIENTS,
        MAX_INSTANCES,
        512,
        512,
        0,
        NULL);

    if (connect_handles[index] == INVALID_HANDLE_VALUE) fail();

    ZeroMemory(&connect_overlapped[index], sizeof(OVERLAPPED));
    connect_overlapped[index].hEvent = connect_events[index];

    if (ConnectNamedPipe(connect_handles[index], &connect_overlapped[index])) 
    {
        err = ERROR_SUCCESS;
    }
    else
    {
        err = GetLastError();

        if (err == ERROR_SUCCESS) err = ERROR_INVALID_FUNCTION;

        if (err == ERROR_PIPE_CONNECTED) err = ERROR_SUCCESS;
    }

    if (err != ERROR_IO_PENDING) 
    {
        ResetEvent(connect_events[index]);
        connect_pipe_completion(index, err);
    }
}

The connect_pipe_completion function would create a new task in the task pool to handle the newly connected pipe instance, and then queue an APC to call new_pipe to create a new listening pipe at the same index.

It is possible to reuse existing pipe instances once they are closed but in this situation I don't think it's worth the hassle.

Yes this is possible. In fact I have something like this at the moment. But what I don't like about it is that each new connection then hold a thread from the pool for the whole duration of the connection. If you have 1000 connections it would use 1000 threads and that is a very bad design. I could connect and disconnect for each request, but then I cannot have a list of active connection and clients on the server side. I loose track of clients and I cannot have duplex communication channels. — Runner, Jul 24 '13 at 05:55
One request is always fast by the way. Requests are packets of binary data. My framework is very high level oriented and abstracts the pipes from the user. Look at the code here: http://www.cromis.net/blog/downloads/cromis-ipc/ — Runner, Jul 24 '13 at 05:57
No, in my design you do not use one thread per connection. When a connection occurs a task is created in the task pool; this task does any necessary pre-processing (such as checking access rules) and then initiates a `ReadFileEx` or `WriteFileEx` as appropriate (by queuing an APC to the I/O thread) and then exits. When that I/O completes, another task is created to process that, and so on. — Harry Johnston, Jul 24 '13 at 20:48
Ok, I understand now. Yes this is something I could do and avoid using IOCP. It would work the way I want it. Thanks. It basically still uses IOCP as they are build into ReadFileEx, but this way I don't have to deal with it manually. — Runner, Jul 25 '13 at 07:04

Named pipes efficient asynchronous design

2 Answers2