Force 100% of CPU usage when using boost::beast::websocket synchronously

Question

I'm trying to spin a dedicated thread pinned to a CPU-core to read some data from a web-socket.

All the resources on that core are mainly dedicated to read data from the socket and parse it.

I do not want a high-throughput, I want a minimum latency to read/decode/dispatch the message, reducing the jitter to the minimum.

I'm using boost::beast/boost::asio to read the data from the socket in a synchronous-blocking way, so I've disabled thread-support in BOOST_ASIO_DISABLE_THREADS.

The code looks like this:

while (state_.load(std::memory_order_acquire) == state::running) {
    ws_client_.read(buffer_);
    if (ws_client_.is_message_done()) {
       // ...
    }
}

This code works fine, but It does not satisfies my needs. I can see that the CPU usage has dropped and that most of the time the thread is sleeping.

I've read that the right way to spin a boost::asio service is by calling poll or poll_one in the boost::asio::io_context, so I've tried this:

while (state_.load(std::memory_order_acquire) == state::running) {
    ioc.poll();
}

This keeps the CPU usage at 100%, exactly what I want. So I'm doing this:

void connect(...) {
    // connected =)
    ws.async_read(buffer, std::ref(*this));
}


void operator()(const boost::system::error_code& ec, std::size_t bytes_written) {
    // ... 

    // I'm done parsing, query next
    ws.async_read(buffer, std::ref(*this));           
}

void poll() {
    ioc.poll();
}

This seems to solve my issues, I wonder if there is a more efficient/elegant way to do this without the need to bind a function callback?

Why do you want to waste power by pegging the CPU at 100%? It's not like you are getting things done any faster that way. In fact, you may be slowing things down by polling aggressively rather than sleeping until there's actually any work to do. — Jesper Juhl, May 09 '20 at 11:25
@JesperJuhl It's a critical path in a low-latency system, keeping the cache-hot and the resources in "warm" state is critical to reduce the jitter. I did benchmark both solutions, the second is 5x times faster in this scenario. — mohabouje, May 09 '20 at 12:02
"I want make sure that the thread is always alive, in "ready-state", to keep the cache-hot, meaning 100% CPU usage." - this sounds like you do not understand what "hot cache" means. — sehe, May 09 '20 at 12:30
If you’re trying to keep your program’s data in the cache for quick retrieval, the way to do that would be to keep other programs from running on the core where they might access other data and flush your data from the cache as a side effect. Spinning the CPU doesn’t bring any further benefit on top of that, since the data in the cache doesn’t spontaneously’time out’, it only gets pushed out when there is contention. Plus spinning the CPU could invoke thermal throttling which could slow you down. — Jeremy Friesner, May 09 '20 at 12:32
@JeremyFriesner There is no other program running on this thread, I explicitly disable this core in the kernel and set the affinity of this thread to run there. The reason I'm spinning is to not pay ~1-2us each time the thread needs to wake-up, I want the minimum latency for each package — mohabouje, May 09 '20 at 12:38
@sehe you are completely right, not the best way to describe what I want. I do not want a high-throughput, I want a minimum latency to read/decode/dispatch the message, reducing the jitter to the minimum. This is the main reason for the spinning. — mohabouje, May 09 '20 at 12:57
You will not really gain a lot from burning all those resources. TCP is not necessarily "latency optimized", nor are the Kernels path to deliver you segments. I doubt you will gain a lot from spinning on the IO service instead of just letting it block on IO completion. If something is 5x faster something else is off in your program. If you want the lowest latency path using normal socket APIs maybe just do a blocking receive on the socket instead of using asio at all. That will avoid the second read syscall after the IO readiness notification, and still not burn CPU cycles. — Matthias247, May 10 '20 at 22:17

Force 100% of CPU usage when using boost::beast::websocket synchronously

0 Answers0