2

all

Recently I am debugging a problem on unix system, by using command

netstat -s

and I get an output with

$ netstat -s

// other fields
// other fields

TCPBacklogDrop: 368504

// other fields
// other fields

I have searched for a while to understand what does this field means, and got mainly two different answers:

  1. This means that your tcp-date-receive-buffer is full, and there are some packages overflow
  2. This means your tcp-accept-buffer is full, and there are some disconnections

Which is the correct one? any offical document to support it?

fding
  • 424
  • 1
  • 5
  • 18

1 Answers1

2

Interpretation #2 is referring to the queue of sockets waiting to be accepted, possibly because its size is set (more or less) by the value of the parameter named backlog to listen. This interpretation, however, is not correct.

To understand why interpretation #1 is correct (although incomplete), we will need to consult the source. First note that the string "TCPBacklogDrop"is associated with the Linux identifier LINUX_MIB_TCPBACKLOGDROP (see, e.g., this). This is incremented here in tcp_add_backlog.

Roughly speaking, there are 3 queues associated with the receive side of an established TCP socket. If the application is blocked on a read when a packet arrives, it will generally be sent to the prequeue for processing in user space in the application process. If it can't be put on the prequeue, and the socket is not locked, it will be placed in the receive queue. However, if the socket is locked, it will be placed in the backlog queue for subsequent processing.

If you follow through the code you will see that the call to sk_add_backlog called from tcp_add_backlog will return -ENOBUFS if the receive queue is full (including that which is in the backlog queue) and the packet will be dropped and the counter incremented. I say this interpretation is incomplete because this is not the only place where a packet could be dropped when the "receive queue" is full (which we now understand to be not as straightforward as a single queue).

I wouldn't expect such drops to be frequent and/or problematic under normal operating conditions as the sender's TCP stack should honor the advertised window of the receiver and not send data exceeding the capacity of the receive queue (with the exception of zero window probes and older kernel versions whose calculations could cause drops when the receive window was not actually full). If it is somehow indicative of a problem, I would start worrying about malicious clients (some form of DDOS maybe) or some failure causing a sockets lock to be held for an extended period of time.

JimD.
  • 2,323
  • 1
  • 13
  • 19
  • Thanks a lot. The clients use a tone of short tcp links and complain about a lot of request delay. I've increased some option values(/proc/sys/net/core/somaxconn, /proc/sys/net/ipv4/tcp_max_syn_backlog, /proc/sys/net/core/netdev_max_backlog, /proc/sys/net/ipv4/tcp_*mem) and it works – fding Jul 03 '20 at 02:35
  • what version kernel are you using? a bit of googling shows older versions could drop packets routed to the backlog queue even though the amount of data sent by the client is less than the advertised receive window. – JimD. Jul 03 '20 at 21:29
  • 4.1.0-33.el6.(my-company-name).x86_64 – fding Jul 06 '20 at 02:45
  • Well, it is not like I will ever find the source for that, but the difference between vanilla 4.1 and the head does have a difference in the calculation to determine whether to drop or not. It adds some headroom since the packets placed on the backlog cannot be consolidated. So if there are a lot of packet dropped from the backlog, upgrading the kernel version may provide some relief. – JimD. Jul 06 '20 at 21:52