5

Clickhouse version: (version 20.3.9.70 (official build)) (I know this is no longer supported, we have plans to upgrade but it takes time)

The Setup

We are running three query nodes (nodes with distributed tables only), we spun up the third one yesterday. All nodes point to the same storage nodes and tables.

The Problem

The node serves requests just fine over TCP and HTTP for up to 11 hours. After that, the clickhouse server starts to close TCP connections. HTTP still works just fine when this happens.

Extra Information/Evidence

  • The system.metrics.tcp_connection number steadily drops over time for the new node.

system.metrics.tcp_connection gradually decreasing over time

  • netstatgives shows a lot of ACTIVE_WAIT connections
netstat -ntp | tail -n+3 | awk '{print $6}' | sort | uniq -c | sort -n
      2 LAST_ACK
    380 CLOSE_WAIT
    386 ESTABLISHED
  29279 TIME_WAIT

Normal node for comparison:

1199 CLOSE_WAIT
1292 ESTABLISHED
186 TIME_WAIT

Opening the clickhouse_client is not possible

user@server:~$ clickhouse-client 
ClickHouse client version 20.3.9.70 (official build).
Connecting to localhost:9000 as user default.
Code: 32. DB::Exception: Attempt to read after eof
  • The following shows up in logs:
2021.12.15 19:00:29.215048 [ 25146 ] {e2f742e013b7d83f5d1d6e524afc5d2b} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 20.3.9.70 (official build))
2021.12.15 19:03:32.098881 [ 25536 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Exception.cpp:27: Poco::IOException::IOException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x1053e380 in /usr/lib/debug/usr/bin/clickhouse
1. /build/obj-x86_64-linux-gnu/../contrib/poco/Net/src/NetException.cpp:26: Poco::Net::NetException::NetException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xe38f6ed in /usr/lib/debug/usr/bin/clickhouse
2. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/string:2134: Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (.cold) @ 0xe3a5093 in /usr/lib/debug/usr/bin/clickhouse
3. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/string:2134: Poco::Net::SocketImpl::peerAddress() @ 0xe3a0633 in /usr/lib/debug/usr/bin/clickhouse
4. /build/obj-x86_64-linux-gnu/../src/IO/ReadBufferFromPocoSocket.cpp:66: DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0x902ffd7 in /usr/lib/debug/usr/bin/clickhouse
5. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3696: DB::TCPHandler::runImpl() @ 0x9023905 in /usr/lib/debug/usr/bin/clickhouse
6. /build/obj-x86_64-linux-gnu/../programs/server/TCPHandler.cpp:1235: DB::TCPHandler::run() @ 0x9025470 in /usr/lib/debug/usr/bin/clickhouse
7. /build/obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerConnection.cpp:57: Poco::Net::TCPServerConnection::start() @ 0xe3ac69b in /usr/lib/debug/usr/bin/clickhouse
8. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:856: Poco::Net::TCPServerDispatcher::run() @ 0xe3acb1d in /usr/lib/debug/usr/bin/clickhouse
9. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/include/Poco/Mutex_STD.h:132: Poco::PooledThread::run() @ 0x105c3317 in /usr/lib/debug/usr/bin/clickhouse
10. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/include/Poco/AutoPtr.h:205: Poco::ThreadImpl::runnableEntry(void*) @ 0x105bf11c in /usr/lib/debug/usr/bin/clickhouse
11. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2615: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(void*), Poco::ThreadImpl*> >(void*) @ 0x105c0abd in /usr/lib/debug/usr/bin/clickhouse
12. start_thread @ 0x8184 in /lib/x86_64-linux-gnu/libpthread-2.19.so
13. __clone @ 0xfe03d in /lib/x86_64-linux-gnu/libc-2.19.so
 (version 20.3.9.70 (official build))
2021.12.15 19:03:32.098881 [ 25536 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):

0. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Exception.cpp:27: Poco::IOException::IOException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x1053e380 in /usr/lib/debug/usr/bin/clickhouse
1. /build/obj-x86_64-linux-gnu/../contrib/poco/Net/src/NetException.cpp:26: Poco::Net::NetException::NetException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > cons
t&, int) @ 0xe38f6ed in /usr/lib/debug/usr/bin/clickhouse
2. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/string:2134: Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (.cold
) @ 0xe3a5093 in /usr/lib/debug/usr/bin/clickhouse
3. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/string:2134: Poco::Net::SocketImpl::peerAddress() @ 0xe3a0633 in /usr/lib/debug/usr/bin/clickhouse
4. /build/obj-x86_64-linux-gnu/../src/IO/ReadBufferFromPocoSocket.cpp:66: DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0x902ffd7 in /usr/lib/debug/usr/bin/cl
ickhouse
5. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/type_traits:3696: DB::TCPHandler::runImpl() @ 0x9023905 in /usr/lib/debug/usr/bin/clickhouse
6. /build/obj-x86_64-linux-gnu/../programs/server/TCPHandler.cpp:1235: DB::TCPHandler::run() @ 0x9025470 in /usr/lib/debug/usr/bin/clickhouse
7. /build/obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerConnection.cpp:57: Poco::Net::TCPServerConnection::start() @ 0xe3ac69b in /usr/lib/debug/usr/bin/clickhouse
8. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:856: Poco::Net::TCPServerDispatcher::run() @ 0xe3acb1d in /usr/lib/debug/usr/bin/clickhouse
9. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/include/Poco/Mutex_STD.h:132: Poco::PooledThread::run() @ 0x105c3317 in /usr/lib/debug/usr/bin/clickhouse
10. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/include/Poco/AutoPtr.h:205: Poco::ThreadImpl::runnableEntry(void*) @ 0x105bf11c in /usr/lib/debug/usr/bin/clickhouse
11. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2615: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__t
hread_struct> >, void* (*)(void*), Poco::ThreadImpl*> >(void*) @ 0x105c0abd in /usr/lib/debug/usr/bin/clickhouse
12. start_thread @ 0x8184 in /lib/x86_64-linux-gnu/libpthread-2.19.so
13. __clone @ 0xfe03d in /lib/x86_64-linux-gnu/libc-2.19.so
 (version 20.3.9.70 (official build))

Attempted Remedies/Debugging

  • Restarting clickhouse on the host temporarily fixes the problem. We have tried it once. This state happens again after 10-11 hours of operation.

  • There are no helpful logs at INFO level before the dwindling of TCP connections

# this returns nothing
cat clickhouse-server.log.18-43-to-18-52 | grep -vE 'Done processing|Client has not sent any data|executeQuery|Processed in'
  • HTTP still works just fine when this happens
user358656
  • 101
  • 1
  • 5

0 Answers0