I have a server application serving thousands of clients for basic TCP operations. It accepts both plain text and TLS connections.
Some of the most sollicitated servers are facing a strange issue since few months.
After few hours or days, there are many sockets blocked in CLOSE_WAIT
state till server stop serving requests due to the lack of file descriptors.
Sockets graphs here : https://ibb.co/WsvS1D9
We tried to add logging so we can see that sockets are blocked on the first read operation (for plain text sockets) or on the handshake (for TLS) sockets.
When this happens, the async_read_until
and async_handshake
function never finish and the associated deadline_timer
callback is never called. It's like the async method is blocking the thread execution!
Connection starts from acceptor
Log::writeLine("("+ StringUtils::printPointer(static_cast<void *>(this)) + ") [FD:"+
StringUtils::toString(socket().native_handle())+"] "+" Connection opened", Log::LEVEL_INFO);
/// Set a deadline for the read operation.
inout_deadline_.expires_from_now(boost::posix_time::seconds(TIMEOUT));
/// initiate calls to timer check
inout_deadline_.async_wait(
strand_.wrap(boost::bind(&TcpServerConnection::check_deadline,
shared_from_this(), boost::asio::placeholders::error, &inout_deadline_)));
Log::writeLine("("+ StringUtils::printPointer(static_cast<void *>(this)) + ") [FD:"+
StringUtils::toString(socket().native_handle())+"] "+" async_wait initialized", Log::LEVEL_INFO);
if (useTLS_) {
/// Start TLS handshake
Log::writeLine("("+ StringUtils::printPointer(static_cast<void *>(this)) + ") [FD:"+
StringUtils::toString(socket().native_handle())+"] "+" async_handshake initialized", Log::LEVEL_INFO);
socket_.async_handshake(boost::asio::ssl::stream_base::server,
strand_.wrap(
boost::bind(&TcpServerConnection::handle_handshake, shared_from_this(),
boost::asio::placeholders::error)));
} else {
startReadLine();
}
In startReadLine() :
inout_deadline_.expires_from_now(boost::posix_time::seconds(TIMEOUT));
Log::writeLine("("+ StringUtils::printPointer(static_cast<void *>(this)) + ") [FD:"+
StringUtils::toString(socket().native_handle())+"] "+" async_read_until initialized", Log::LEVEL_INFO);
if(useTLS_) {
boost::asio::async_read_until(socket_, lineBuffer_, lineDelimiter_,
strand_.wrap(
boost::bind(&TcpServerConnection::handle_read_line, shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred,
false)));
} else {
boost::asio::async_read_until(socket_.next_layer(), lineBuffer_, lineDelimiter_,
strand_.wrap(
boost::bind(&TcpServerConnection::handle_read_line, shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred,
false)));
}
Below are 2 examples of log we get for sockets in CLOSE_WAIT
state :
PLAIN SOCKET
Jan 4 18:30:28 INFO: (0x7efcf84865b0) [FD:831] Incoming connection: SOCKET RBUF=262142 SBUF=262142
Jan 4 18:30:28 INFO: (0x7efcf84865b0) [FD:831] Connection opened
Jan 4 18:30:28 INFO: (0x7efcf84865b0) [FD:831] async_wait initialized
Jan 4 18:30:28 INFO: (0x7efcf84865b0) [FD:831] startReadLine() called!
Jan 4 18:30:28 INFO: (0x7efcf84865b0) [FD:831] async_read_until initialized
handle_read_line()
and check_deadline()
are never called
TLS SOCKET
Jan 4 17:33:06 INFO: (0x7efcfc14a7f0) [FD:513] Incoming connection: SOCKET RBUF=262142 SBUF=262142
Jan 4 17:33:06 INFO: (0x7efcfc14a7f0) [FD:513] Connection opened
Jan 4 17:33:06 INFO: (0x7efcfc14a7f0) [FD:513] async_wait initialized
Jan 4 17:33:06 INFO: (0x7efcfc14a7f0) [FD:513] async_handshake initialized
handle_handshake()
and check_deadline()
are never called
We tried to update to last version of Boost (1.75) and OpenSSL (1.1.1i), nothing changed.
All help will be much appreciated :)