1

Following Michael Caisse's cppcon talk I created a connection handler MyUserConnection which has a sendMessage method. sendMessage method adds a message to the queue similarly to the send() in the cppcon talk. My sendMessage method is called from multiple threads outside of the connection handler in high intervals. The messages must be enqueued chronologically.

When I run my code with only one Asio io_service::run call (aka one io_service thread) it async_write's and empties my queue as expected (FIFO), however, the problem occurs when there are, for example, 4 io_service::run calls, then the queue is not filled or the send calls are not called chronologically.

class MyUserConnection : public std::enable_shared_from_this<MyUserConnection> {
public:
  MyUserConnection(asio::io_service& io_service, SslSocket socket) :
      service_(io_service),
      socket_(std::move(socket)),
      strand_(io_service) {
  }

  void sendMessage(std::string msg) {
    auto self(shared_from_this());
    service_.post(strand_.wrap([self, msg]() {
      self->queueMessage(msg);
    }));
  }
  
private:
  void queueMessage(const std::string& msg) {
    bool writeInProgress = !sendPacketQueue_.empty();
    sendPacketQueue_.push_back(msg);
    if (!writeInProgress) {
      startPacketSend();
    }
  }

  void startPacketSend() {
    auto self(shared_from_this());
    asio::async_write(socket_,
                      asio::buffer(sendPacketQueue_.front().data(), sendPacketQueue_.front().length()),
                      strand_.wrap([self](const std::error_code& ec, std::size_t /*n*/) {
                        self->packetSendDone(ec);
                      }));
  }

  void packetSendDone(const std::error_code& ec) {
    if (!ec) {
      sendPacketQueue_.pop_front();
      if (!sendPacketQueue_.empty()) { startPacketSend(); }
    } else {
      // end(); // My end call 
    }
  }
  
  asio::io_service& service_;
  SslSocket socket_;
  asio::io_service::strand strand_;
  std::deque<std::string> sendPacketQueue_;
};

I'm quite sure that I misinterpreted the strand and io_service::post when running the connection handler on multithreaded io_service. I'm also quite sure that the messages are not enqueued chronologically instead of messages not being async_write chronologically. How to ensure that the messages will be enqueued in chronological order in sendMessage call on multithreaded io_service?

Drejc
  • 491
  • 6
  • 21

2 Answers2

1

If you use a strand, the order is guaranteed to be the order in which you post the operations to the strand.

Of course, if there is some kind of "correct ordering" between threads that post then you have to synchronize the posting between them, that's your application domain.

Here's a modernized, simplified take on your MyUserConnection class with a self-contained server test program:

Live On Coliru

#include <boost/asio.hpp>
#include <boost/asio/ssl.hpp>
#include <deque>
#include <iostream>
#include <mutex>

namespace asio = boost::asio;
namespace ssl  = asio::ssl;
using asio::ip::tcp;
using boost::system::error_code;
using SslSocket = ssl::stream<tcp::socket>;

class MyUserConnection : public std::enable_shared_from_this<MyUserConnection> {
  public:
    MyUserConnection(SslSocket&& socket) : socket_(std::move(socket)) {}

    void start() {
        std::cerr << "Handshake initiated" << std::endl;
        socket_.async_handshake(ssl::stream_base::handshake_type::server,
                                [self = shared_from_this()](error_code ec) {
                                    std::cerr << "Handshake complete" << std::endl;
                                });
    }

    void sendMessage(std::string msg) {
        post(socket_.get_executor(),
             [self = shared_from_this(), msg = std::move(msg)]() {
                 self->queueMessage(msg);
             });
    }

  private:
    void queueMessage(std::string msg) {
        outbox_.push_back(std::move(msg));
        if (outbox_.size() == 1)
            sendLoop();
    }

    void sendLoop() {
        std::cerr << "Sendloop " << outbox_.size() << std::endl;
        if (outbox_.empty())
            return;

        asio::async_write( //
            socket_, asio::buffer(outbox_.front()),
            [this, self = shared_from_this()](error_code ec, std::size_t) {
                if (!ec) {
                    outbox_.pop_front();
                    sendLoop();
                } else {
                    end();
                }
            });
    }

    void end() {}

    SslSocket                socket_;
    std::deque<std::string>  outbox_;
};

int main() {
    asio::thread_pool ioc;
    ssl::context      ctx(ssl::context::sslv23_server);
    ctx.set_password_callback([](auto...) { return "test"; });
    ctx.use_certificate_file("server.pem", ssl::context::file_format::pem);
    ctx.use_private_key_file("server.pem", ssl::context::file_format::pem);
    ctx.use_tmp_dh_file("dh2048.pem");

    tcp::acceptor a(ioc, {{}, 8989u});

    for (;;) {
        auto s = a.accept(make_strand(ioc.get_executor()));
        std::cerr << "accepted " << s.remote_endpoint() << std::endl;
        auto sess = make_shared<MyUserConnection>(SslSocket(std::move(s), ctx));
        sess->start();
        for(int i = 0; i<30; ++i) {
            post(ioc, [sess, i] {
                std::string msg = "message #" + std::to_string(i) + "\n";
                {
                    static std::mutex mx;
                    // Lock so console output is guaranteed in the same order
                    // as the sendMessage call
                    std::lock_guard lk(mx);
                    std::cout << "Sending " << msg << std::flush;
                    sess->sendMessage(std::move(msg));
                }
            });
        }

        break; // for online demo
    }

    ioc.join();
}

If you run it a few times, you will see that

  • the order in which the threads post is not deterministic (that's up to the kernel scheduling)
  • the order in which messages are sent (and received) is exactly the order in which they are posted.

See live demo runs on my machine:

enter image description here

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1. (... if you use strand ... ) I used my local strand with dispatch `strand_.dispatch(strand_.wrap([self,msg]...))` instead of service_ in `sendMessage` and the order was correct. 2. (method `post`) If i use `asio::post` in `sendMessage` must i also call send message method inside an `asio::post`? 3. (A follow up question) How to end things to not to brake my code, I saw that the `socket_.lowest_layer().close()` must be wrapped inside a strand. I tested it and if I don't use `strand_.wrap(socket_...close())` than app crashes with invalid read. – Drejc Feb 11 '22 at 09:51
  • Of course. All access to the `socket_` must be serialized (because of [documented trhead safety](https://www.boost.org/doc/libs/1_78_0/doc/html/boost_asio/reference/basic_stream_socket.html#boost_asio.reference.basic_stream_socket.thread_safety). This is in part why I don't have an explicit `close()` in the first place. It will happen automatically on destructor. – sehe Feb 23 '22 at 19:58
  • I don't precisely know how you would `strand_.wrap(socket_...close())` - that doesn't seem syntactically right. But the similar `post(strand_, [self=shared_from_this()] { self->socket_.lowest_layer().shutdown(); }` or so would make sense. Note that my sample does exactly that inside the public member function `sendMessage()` – sehe Feb 23 '22 at 20:00
0

On a multi core or even on a single core preemptive OS, you cannot truly feed messages into a queue in strictly chronological order. Even if you use a mutex to synchronize write access to the queue, the strict order is no longer guaranteed once multiple writers wait on the mutex and the mutex becomes free. At best, the order, in which the waiting write threads acquire the mutex, is implementation dependent (OS code dependent), but it is best to assume it is just random.

With that being said, the strict chronological order is a matter of definition in the first place. To explain that, imagine your PC has some digital output bits (1 for each writer thread) and you connected a logic analyzer to those bits.... and imagine, you pick some spot in the code, where you toggle such a respective bit in your enqueue function. Even if that bit toggle takes place just one assembly instruction prior to acquiring the mutex, it is possible, that the order had been changed, while the writer code approached that point. You could also set it to other arbirtrary points prior (e.g. when you enter the enqueue function). But then, the same reasoning applies. Hence, the strict chronological order is in itself a matter of definition.

There is an analogy to a case, where a CPUs interrupt controller has multiple inputs and you tried to build a system which processes those interrupts in strictly chronological order. Even if all interrupt inputs were signaled exactly at the same moment (a switch, pulling them all to signaled state simultaneously), some order would occur (e.g. caused by hardware logic or just by noise at the input pins or by the systems interrupt dispatcher function (some CPUs (e.g. MIPS 4102) have a single interrupt vector and assembly code checks the possible interrupt sources and dispatches to dedicated interrupt handlers).

This analogy helps see the pattern: It comes down to asynchronous inputs on a synchronous system. Which is a notoriously hard problem in itself.

So, the best you could possibly do, is to make a suitable definition of your applications "strict ordering" and live with it.

Then, to avoid violations of your definition, you could use a priority queue instead of a normal FIFO data type and use as priority some atomic counter:

  • At your chosen point in the code, atomically read and increment the counter. This is your messages sequence number.
  • Assemble your message and enqueue it into the priority queue, using your sequence number as priority.

Another possible approach is to define a notion of "simultaneous", which is detectable on the other side of the queue (and thus, the reader cannot assume strict ordering for a set of "simultaneous" messages). This could be implemented by reading some high frequency tick count and all those messages, which have the same "time stamp" are to be considered simultaneous on reader side.

BitTickler
  • 10,905
  • 5
  • 32
  • 53