Not necessarily an answer but too big to put in a comment.
tcp_mem (since Linux 2.4)
This is a vector of 3 integers: [low, pressure, high]. These bounds, measured in units of the system page size, are used by
TCP to track its memory usage. The defaults are calculated at boot time from the amount of available memory. (TCP can only
use low memory for this, which is limited to around 900 megabytes on 32-bit systems. 64-bit systems do not suffer this limi-
tation.)
low TCP doesnât regulate its memory allocation when the number of pages it has allocated globally is below this number.
pressure When the amount of memory allocated by TCP exceeds this number of pages, TCP moderates its memory consumption. This
memory pressure state is exited once the number of pages allocated falls below the low mark.
high The maximum number of pages, globally, that TCP will allocate. This value overrides any other limits imposed by the
kernel.
Note the following:
These bounds, measured in units of the system page size
Setting that value to 10000000 10000000 10000000
is stating to the kernel to use 39062 MiB of memory for TCP. Nearly triple what you have.
The second problem is the 3 values for TCP rmem
and wmem
you set define the min, default and max. Given that your tcp_mem configuration states you never goes into 'memory saving' mode I imagine that you are actually allocating somewhere between 4-16k per socket.
So, if I was the kernel and I saw such insane settings I might not behave that predictably either.
Try reducing that value down to something you can actually use and trying again.
Finally, I will point out that you are living in a dream world if you seriously believe that:
- The kernel will support 2 million connections with any comfort.
- Node or java will support 2 million connections with any comfort.
Even under best circumstances (using an epoll set) 2 million entries in an epoll set is expensive. Thats never going to happen with a worker or prefork model.
You need to be spreading this load out more evenly. You probably need another 10 nodes at least to get anything worthy of what a user would call a service.