0

I've been following the guide here https://medium.freecodecamp.org/how-we-fine-tuned-haproxy-to-achieve-2-000-000-concurrent-ssl-connections-d017e61a4d27

that claims they achieved 2mil ssl connections to haproxy.

I have 1 server ubuntu 16.04, 6 cores 24GB ram. I have set the file limits to infinity using systemd and see their value:

#cat /proxy/{PID}/limits

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             96218                96218                processes
Max open files            1048576              1048576              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       96218                96218                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I'm using vegeta to simulate the load on 4x aws ec2 c5.9xlarge instance. When I run them against my server I'm checking connections with:

#ss -s

Total: 24024 (kernel 0)
TCP:   23742 (estab 22106, closed 53, orphaned 58, synrecv 0, timewait 53/0), ports 0

Transport Total     IP        IPv6
*         0         -         -
RAW       0         0         0
UDP       5         3         2
TCP       23689     23688     1
INET      23694     23691     3
FRAG      0         0         0

I'm pretty happy with the 24k connections b/c with a stock install I couldnt get above around 7k. But I'm still not achieving anywhere new 2mil.

I'm not sure where I've gone wrong or whats limiting me.

Can you help me understand what I should be checking to find out whats limiting me and how I can correct it to achieve as many connections as possible?

EDIT When the test was ran I had a single 10GB NIC (VMNEXT3 as this is all virtual) I have since added 2 more 10GB Nics anticipating doing some layer 4 loadbalancing there.

haproxy config global

        log 127.0.0.1:22514 local2 debug
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
        maxconn 2000000
        nbproc 6
        cpu-map 1 0
        cpu-map 2 1
        cpu-map 3 2
        cpu-map 4 3
        cpu-map 5 4
        cpu-map 6 5
        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # Default ciphers to use on SSL-enabled listening sockets.
        # For more information, see ciphers(1SSL). This list is from:
        #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3
        tune.ssl.default-dh-param 2048

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        option http-server-close
        timeout connect 50000000
        timeout client  50000000
        timeout server  50000000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http

frontend loadbalanced_main
    log global
    bind *:80
    mode http
    redirect scheme https if !{ ssl_fc }
    acl web1 hdr(host) -i -m sub 1.mydomain.com
    acl web2 hdr(host) -i -m sub 2.mydomain.com
    acl web3 hdr(host) -i -m sub 3.mydomain.com
    use_backend ordweb1 if web1
    use_backend ordweb2 if web2
    use_backend ordweb3 if web3
    default_backend loadbalanced_nodes

frontend loadbalanced_main_ssl
        log global
        bind *:443 ssl crt /etc/ssl/private/mydomain.com.pem crt /etc/ssl/private/hctb.com.pem
        reqadd X-Forwarded-Proto:\ https
        acl web1 hdr(host) -i -m sub 1.mydomain.com
        acl web1 hdr(host) -i -m sub 1.myotherdomain.com
        acl web2 hdr(host) -i -m sub 2.mydomain.com
        acl web2 hdr(host) -i -m sub 2.myotherdomain.com
        acl web3 hdr(host) -i -m sub 3.mydomain.com
        acl web3 hdr(host) -i -m sub 3.myotherdomain.com
        use_backend ordweb1 if web1
        use_backend ordweb2 if web2
        use_backend ordweb3 if web3
        default_backend loadbalanced_nodes

backend ordweb1
    mode http
    maxconn 2000000
    redirect scheme https if !{ ssl_fc }
    balance roundrobin
    option forwardfor
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    option httpchk HEAD / HTTP/1.1\r\nHost:localhost
    server ordweb1 10.154.18.100:80 cookie check

backend ordweb2
    mode http
    maxconn 2000000

backend ordweb2
    mode http
    maxconn 2000000
    redirect scheme https if !{ ssl_fc }
    balance roundrobin
    option forwardfor
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    option httpchk HEAD / HTTP/1.1\r\nHost:localhost
    server ordweb2 10.154.18.8:80 cookie check

backend ordweb3
    mode http
    maxconn 2000000
    redirect scheme https if !{ ssl_fc }
    balance roundrobin
    option forwardfor
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    option httpchk HEAD / HTTP/1.1\r\nHost:localhost
    server ordweb3 10.154.18.9:80 cookie check

backend loadbalanced_nodes
    mode http
    maxconn 2000000
    redirect scheme https if !{ ssl_fc }
    balance roundrobin
    option forwardfor
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    option httpchk HEAD / HTTP/1.1\r\nHost:localhost
    cookie SRV insert indirect nocache
    server ordweb1 10.154.18.100:80 check cookie ordweb1
    server ordweb2 10.154.18.8:80 check cookie ordweb2
    server ordweb3 10.154.18.9:80 check cookie ordweb3

listen stats
        bind *:1936
        stats enable
        stats uri /
        stats hide-version
        stats auth mydomain:fakeapss
Wjdavis5
  • 121
  • 1
  • 9
  • 1
    You didn't post your haproxy config and the networking settings on your server: make sure that you are using all those cores with haproxy and the queues of your NIC cards. There are also bandwidth limitation issues (both as in amount of data / sec and amount of packets per sec) on both ends. Ideally you would make this test on a 1Gbit LAN or even 10. – Florin Asăvoaie Jan 13 '18 at 06:14
  • Also take into consideration stateful firewalls, and other kernel tuning. – Florin Asăvoaie Jan 13 '18 at 06:16
  • @FlorinAsăvoaie could you elaborate? I've also updated the info provided. – Wjdavis5 Jan 14 '18 at 16:58
  • You have a 10G NIC, but is your actual bandwidth from Amazon supporting that much? Please follow a thorough analysis and start checking where is the bottleneck: network, CPU, memory? Use iptraf and see if you see a cap on bandwidth (mbits per sec) or packets per sec. – Florin Asăvoaie Jan 14 '18 at 18:24
  • The problem isnt bandwidth, I was using 4 instances in aws all of which had a 10gb connection. Further the endpoint I'm testing is returning "pong" from a test method. The problem here is I cannot seem to get more than 24k concurrent connections. – Wjdavis5 Jan 14 '18 at 21:07
  • What I hear is you saying that in theory it isn't a bandwidth problem. However, I don't see any empirical evidence. When you do testing performance, one of the main goals is to find the bottle neck. The bottleneck is always a resource limitation or a software limitation. Both can be in network, cpu, memory, and disk. In your case it is unlikely to be the disk. If you want to listen for my advice and find the bottleneck then solve it, empirically, go ahead. If not, this question doesn't belong to serverfault. – Florin Asăvoaie Jan 15 '18 at 03:29
  • @FlorinAsăvoaie What I meant was I have 4x 10GB instances running in Amazon. You're suggesting my limitation may be bandwidth on the Amazon side becoming saturated before I saturate my local single 10GB nic. I'm using datadog to monitor traffic. I'm not seeing anywhere near 10GB leaving amazon, or coming into my network. So, empirically, it is not a bandwidth limitation. – Wjdavis5 Jan 16 '18 at 01:00
  • And the haproxy server is also running in aws? What you are saying is not empirical. You do not understand what a bottleneck is or what could limit bandwidth or anything else. Unless the CPU is always 100% or you run out of RAM, bandwidth IS your issue, in some way. – Florin Asăvoaie Jan 16 '18 at 09:12
  • @FlorinAsăvoaie - Our conversation has obviously degraded beyond being helpful. Nonetheless I've been able to tune my server to achieve the required concurrent connections. Thanks for the feedback. – Wjdavis5 Jan 19 '18 at 21:40

1 Answers1

0

I was finally able to get things squared away thanks to the help of this article: https://medium.com/@pawilon/tuning-your-linux-kernel-and-haproxy-instance-for-high-loads-1a2105ea553e

For me the things I was missing from the other article were the modifications to nf_conntrack and a couple of tcp kernel level tunings.

Wjdavis5
  • 121
  • 1
  • 9