0

I'm new to zmq and cppzmq. While trying to run the multithreaded example in the official guide: http://zguide.zeromq.org/cpp:mtserver

My setup

  • macOS Mojave, Xcode 10.3
  • libzmq 4.3.2 via Homebrew
  • cppzmq GitHub HEAD

I hit a few problems.

Problem 1

When running source code in the guide, it hangs forever without any stdout output shown up.

Here is the code directly copied from the Guide.

/*
    Multithreaded Hello World server in C
*/

#include <pthread.h>
#include <unistd.h>
#include <cassert>
#include <string>
#include <iostream>
#include <zmq.hpp>

void *worker_routine (void *arg)
{
    zmq::context_t *context = (zmq::context_t *) arg;

    zmq::socket_t socket (*context, ZMQ_REP);
    socket.connect ("inproc://workers");

    while (true) {
        //  Wait for next request from client
        zmq::message_t request;
        socket.recv (&request);
        std::cout << "Received request: [" << (char*) request.data() << "]" << std::endl;

        //  Do some 'work'
        sleep (1);

        //  Send reply back to client
        zmq::message_t reply (6);
        memcpy ((void *) reply.data (), "World", 6);
        socket.send (reply);
    }
    return (NULL);
}

int main ()
{
    //  Prepare our context and sockets
    zmq::context_t context (1);
    zmq::socket_t clients (context, ZMQ_ROUTER);
    clients.bind ("tcp://*:5555");
    zmq::socket_t workers (context, ZMQ_DEALER);
    workers.bind ("inproc://workers");

    //  Launch pool of worker threads
    for (int thread_nbr = 0; thread_nbr != 5; thread_nbr++) {
        pthread_t worker;
        pthread_create (&worker, NULL, worker_routine, (void *) &context);
    }
    //  Connect work threads to client threads via a queue
    zmq::proxy (static_cast<void*>(clients),
                static_cast<void*>(workers),
                nullptr);
    return 0;
}

It crashes soon after I put a breakpoint in the while loop of the worker.

Problem 2

Noticing that the compiler prompted me to replace deprecated API calls, I modified the above sample code to make the warnings disappear.

/*
 Multithreaded Hello World server in C
 */

#include <pthread.h>
#include <unistd.h>
#include <cassert>
#include <string>
#include <iostream>
#include <cstdio>
#include <zmq.hpp>

void *worker_routine (void *arg)
{
    zmq::context_t *context = (zmq::context_t *) arg;

    zmq::socket_t socket (*context, ZMQ_REP);
    socket.connect ("inproc://workers");

    while (true) {
        //  Wait for next request from client
        std::array<char, 1024> buf{'\0'};
        zmq::mutable_buffer request(buf.data(), buf.size());
        socket.recv(request, zmq::recv_flags::dontwait);
        std::cout << "Received request: [" << (char*) request.data() << "]" << std::endl;

        //  Do some 'work'
        sleep (1);

        //  Send reply back to client
        zmq::message_t reply (6);
        memcpy ((void *) reply.data (), "World", 6);
        try {
            socket.send (reply, zmq::send_flags::dontwait);
        }
        catch (zmq::error_t& e) {
            printf("ERROR: %X\n", e.num());
        }
    }
    return (NULL);
}

int main ()
{
    //  Prepare our context and sockets
    zmq::context_t context (1);
    zmq::socket_t clients (context, ZMQ_ROUTER);
    clients.bind ("tcp://*:5555");  // who i talk to.
    zmq::socket_t workers (context, ZMQ_DEALER);
    workers.bind ("inproc://workers");

    //  Launch pool of worker threads
    for (int thread_nbr = 0; thread_nbr != 5; thread_nbr++) {
        pthread_t worker;
        pthread_create (&worker, NULL, worker_routine, (void *) &context);
    }
    //  Connect work threads to client threads via a queue
    zmq::proxy (clients, workers);
    return 0;
}

I'm not pretending to have a literal translation of the original broken example, but it's my effort to make things compile and run without obvious memory errors.

This code keeps giving me error number 9523DFB156384763in Hex) from the try-catch block. I can't find the definition of the error number in official docs, but got it from this question that it's the native ZeroMQ error EFSM:

The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state. This error may occur with socket types that switch between several states, such as ZMQ_REP.

I'd appreciate it if anyone can point out where I did wrong.

UPDATE

I tried polling according to @user3666197 's suggestion. But still the program hangs. Inserting any breakpoint effectively crashes the program, making it difficult to debug.

Here is the new worker code

void *worker_routine (void *arg)
{
    zmq::context_t *context = (zmq::context_t *) arg;

    zmq::socket_t socket (*context, ZMQ_REP);
    socket.connect ("inproc://workers");

    zmq::pollitem_t items[1] = { { socket, 0, ZMQ_POLLIN, 0 } };

    while (true) {
        if(zmq::poll(items, 1, -1) < 1) {
            printf("Terminating worker\n");
            break;
        }

        //  Wait for next request from client
        std::array<char, 1024> buf{'\0'};
        socket.recv(zmq::buffer(buf), zmq::recv_flags::none);
        std::cout << "Received request: [" << (char*) buf.data() << "]" << std::endl;

        //  Do some 'work'
        sleep (1);

        //  Send reply back to client
        zmq::message_t reply (6);
        memcpy ((void *) reply.data (), "World", 6);
        try {
            socket.send (reply, zmq::send_flags::dontwait);
        }
        catch (zmq::error_t& e) {
            printf("ERROR: %s\n", e.what());
        }
    }
    return (NULL);
}
kakyo
  • 10,460
  • 14
  • 76
  • 140
  • Regarding the error number, have you tried printing it as hexadecimal? Perhaps it's a bitmask? – Some programmer dude Sep 13 '19 at 08:50
  • @Someprogrammerdude Hi, I just updated my question with the hex error. – kakyo Sep 13 '19 at 08:57
  • Are you running both server and client? Server will just sit there and wait for the client to send something. – 500 - Internal Server Error Sep 13 '19 at 09:01
  • @500-InternalServerError Yes, I believe that's the story the sample code sells. Well, this is the sample code so I expected it to run without issues. Also, this is in-process multithreads, so I expect that server/client are just facade to underlying thread coordination. Am I wrong? – kakyo Sep 13 '19 at 09:08
  • @500-InternalServerError I checked the definition of the ZMQ_REP socket `ZMQ_REP A socket of type ZMQ_REP is used by a service to receive requests from and send replies to a client. This socket type allows only an alternating sequence of zmq_recv(request) and subsequent zmq_send(reply) calls. Each request received is fair-queued from among all clients, and each reply sent is routed to the client that issued the last request. If the original requester does not exist any more the reply is silently discarded.` It seems the sample code intent is valid. – kakyo Sep 13 '19 at 09:29

2 Answers2

1

Welcome to the domain of the Zen-of-Zero

Suspect #1: the code jumps straight into an unresolveable live-lock due to a move into ill-directed state of the distributed-Finite-State-Automaton:

While I since ever advocate for preferring non-blocking .recv()-s, the code above simply commits suicide right by using this step:

socket.recv( request, zmq::recv_flags::dontwait ); // socket being == ZMQ_REP

kills all chances for any other future life but the very error The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state.
as
going into the .send()-able state is possible if and only if a previous .recv()-ed has delivered a real message.


The Best Next Step :

Review the code and may either use a blocking-form of the .recv() before going to .send() or, better, use a { blocking | non-blocking }-form of .poll( { 0 | timeout }, ZMQ_POLLIN ) before entering into an attempt to .recv() and keep doing other things, if there is nothing to receive yet ( so as to avoid the self suicidal throwing the dFSA into an uresolvable collision, flooding your stdout/stderr with a second-spaced flow of printf(" ERROR: %X\n", e.num() ); )


Error Handling :

Better use const char *zmq_strerror ( int errnum ); being fed by int zmq_errno (void);


The Problem 1 :

On the contrary to the suicidal ::dontwait flag in the Problem 2 root cause, the Problem 2 root cause is, that a blocking-form of the first .recv() here moves all the worker-threads into an undeterministically long, possibly infinite, waiting-state, as the .recv()-blocks proceeding to any further step until a real message arrives ( which it does not seem from the MCVE, that it ever will ) and so your pool-of-threads remains in a pool-wide blocked-waiting-state and nothing will ever happen until any message arrived.


Update on how the REQ/REP works :

The REQ/REP Scalable Communication Pattern Archetype works like a distributed pair of people - one, let's call her Mary, asks ( Mary .send()-s the REQ ), while the other one, say Bob the REP listens in a potentially infinitely long blocking .recv() ( or takes a due care, using .poll() to orderly and regularly check, if Mary has asked about something or not and continues to do his own hobbies or gardening otherwise ) and once the Bob's end gets a message, Bob can go and .send() Mary a reply ( not before, as he knows nothing when and what Mary would ( or would not ) ask in the nearer of farther future ) ) and Mary is fair not to ask her next REQ.send()-question to Bob anytime sooner but after Bob has ( REP.send() ) replied and Mary has received Bob's message ( REQ.recv() ) - which is fair and more symmetric, than a real life may exhibit among real people under one roof :o)

The code?

The code is not a reproducible MCVE. The main() creates five Bobs ( hanging waiting a call from Mary, somewhere over inproc:// transport-class ), but no Mary ever calls, or does she? Not visible sign of any Mary trying to do so, the less her ( their, could be a (even a dynamic) community of N:M herd-of-Mary(s):herd-of-5-Bobs relation ) attempt(s) to handle REP-ly(s) coming from either one of the 5-Bobs.

Persevere, ZeroMQ took me some time of scratching my own head, yet the years after I took a due care to learn the Zen-of-Zero are still a rewarding eternal walk in the Gardens of Paradise. No localhost serial-code IDE will ever be able to "debug" a distributed-system (unless a distributed-inspector infrastructure is inplace, a due architecture for a distributed-system monitor/tracer/debugger is another layer of distributed messaging/signaling layer atop of the debugged distributed messaging/signaling system - so do not expect it from a trivial localhost serial-code IDE.

If still in doubts, isolate potential troublemakers - replace inproc:// with tcp:// and if toys do not work with tcp:// (where one can wire-line trace the messages) it won't with inproc:// memory-zone tricks.

user3666197
  • 1
  • 6
  • 50
  • 92
  • Thanks for the detailed advice. I replaced `recv_flags` with `zmq::recv_flags::none` and now I stopped getting the throws from `send` but returned to the hangs. So I guess this time it's the blocking recv. Will try poll, which seems complex. What I don't understand: the definition of the ZMQ_REP socket says: `ZMQ_REP is used by a service to receive requests from and send replies to a client. This socket type allows only an alternating sequence of zmq_recv(request) and subsequent zmq_send(reply) calls.` Hence an initial recv(request) as in the worker seems the right thing to do. Am I wrong? – kakyo Sep 14 '19 at 01:27
  • Would you mind taking a look at the updated code snippets near the end of my question? I added polling but it still gives me the blocking behaviour at the `recv` call, which now uses `none` as the flag.Tried `dontwait` and `none` on both sender/receiver ends, same result. I'm pretty beaten at this point. – kakyo Sep 14 '19 at 05:19
0

About the hanging that I saw in my UPDATED question, I finally figured out what's going on. It's a false expectation on my part.

This very sample code in my question is never meant to be a self-contained service/client code: It is a server-only app with ZMQ_REP socket. It just waits for any client code to send request through ZMQ_REQ sockets. So the "hang" that I was seeing is completely normal!

As soon as I hook up a client app to it, things start rolling instantly. This chapter is somewhere in the middle of the Guide and I was only concerned with multithreading so I skipped many code samples and messaging patterns, which led to my confusion.

The code comments even said it's a server, but I expected to see explicit confirmation from the program. So to be fair the lack of visual cue and the compiler deprecation warning caused me to question the sample code as a new user, but the story that the code tells is valid.

Such a shame on wasted time! But all of a sudden all @user3666197 says in his answer starts to make sense.

For the completeness of this question, the updated server thread worker code that works:


// server.cpp

void *worker_routine (void *arg)
{
    zmq::context_t *context = (zmq::context_t *) arg;

    zmq::socket_t socket (*context, ZMQ_REP);
    socket.connect ("inproc://workers");

    while (true) {
        //  Wait for next request from client
        std::array<char, 1024> buf{'\0'};
        socket.recv(zmq::buffer(buf), zmq::recv_flags::none);
        std::cout << "Received request: [" << (char*) buf.data() << "]" << std::endl;

        //  Do some 'work'
        sleep (1);

        //  Send reply back to client
        zmq::message_t reply (6);
        memcpy ((void *) reply.data (), "World", 6);
        try {
            socket.send (reply, zmq::send_flags::dontwait);
        }
        catch (zmq::error_t& e) {
            printf("ERROR: %s\n", e.what());
        }
    }
    return (NULL);
}

The much needed client code:


// client.cpp

int main (void)
{
    void *context = zmq_ctx_new ();

    //  Socket to talk to server
    void *requester = zmq_socket (context, ZMQ_REQ);
    zmq_connect (requester, "tcp://localhost:5555");

    int request_nbr;
    for (request_nbr = 0; request_nbr != 10; request_nbr++) {
        zmq_send (requester, "Hello", 6, 0);
        char buf[6];
        zmq_recv (requester, buf, 6, 0);
        printf ("Received reply %d [%s]\n", request_nbr, buf);
    }
    zmq_close (requester);
    zmq_ctx_destroy (context);
    return 0;
}

The server worker does not have to poll manually because it has been wrapped into the zmq::proxy.

kakyo
  • 10,460
  • 14
  • 76
  • 140