6

I am experiencing suboptimal performance on an EC2 instance connecting to an RDS instance. This particular instance was built before VPC's existed, so all the traffic is flowing through a single virtual NIC.

The traffic profile is sustained around 200mbit/s and bursts to the full capacity of the gig line for 5-10 seconds, which is all inbound MYSQL traffic (for the bursts). The HTTP front end traffic is fairly light in comparison.

Up until recently, the bursts caused a TCP congestion collapse. I believe that is resolved, but I'm sure there are plenty of tuning opportunities still. I'd love to put in the work, but I'm hoping someone with good networking/shaping knowledge will point me at the things to target next.

Here is what I've done so far:

I changed the TCP congestion and receive windows to be 10 (up from 3 - Linux server)

I did this with a static route:

ip route show
10.x.x.x/26 dev eth0  proto kernel  scope link  src 10.x.x.x 
169.x.x.x/16 dev eth0  scope link  metric 1002 
default via 10.x.x.x dev eth0  initcwnd 10 initrwnd 10

That seems to possibly have alleviated the worst of the TCP congestion collapses.

I also prioritized the MYSQL traffic outbound using tc:

tc qdisc add dev eth0 root handle 1: prio
tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport 22 0xffff flowid 1:1
tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport 3306 0xffff flowid 1:1
tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip protocol 1 0xff flowid 1:1

Which seems to make the app respond better during the periods of congestion. However, we monitor connections to the DB, HTTP backends and java mail, and they tend to suffer more now during those congestion periods, which makes sense.

So I'm on to TCP tuning, but since HTTP and MYSQL traffic is very different, I'm not sure what I should target next. Here is a netstat -s from 3 days of uptime. I don't believe we've had a collapse in this time.

netstat -s
Ip:
    699681065 total packets received
    2 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    699681063 incoming packets delivered
    594010173 requests sent out
Icmp:
    938 ICMP messages received
    19 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 885
        timeout in transit: 10
        redirects: 2
        echo requests: 41
    58 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 17
        echo replies: 41
IcmpMsg:
        InType3: 885
        InType5: 2
        InType8: 41
        InType11: 10
        OutType0: 41
        OutType3: 17
Tcp:
    130279 active connections openings
    6710364 passive connection openings
    32963 failed connection attempts
    42347 connection resets received
    922 connections established
    697481982 segments received
    591322387 segments send out
    489523 segments retransmited
    46 bad segments received.
    298575 resets sent
Udp:
    2489966 packets received
    17 packets to unknown port received.
    0 packet receive errors
    2490045 packets sent
UdpLite:
TcpExt:
    4690 SYN cookies sent
    1375 SYN cookies received
    139230 invalid SYN cookies received
    3720 resets received for embryonic SYN_RECV sockets
    5703049 TCP sockets finished time wait in fast timer
    6814 packets rejects in established connections because of timestamp
    21888759 delayed acks sent
    2610 delayed acks further delayed because of locked socket
    Quick ack mode was activated 75329 times
    6082 times the listen queue of a socket overflowed
    6082 SYNs to LISTEN sockets ignored
    269160054 packets directly queued to recvmsg prequeue.
    4952741078 packets directly received from backlog
    34676003473 packets directly received from prequeue
    459594123 packets header predicted
    39118686 packets header predicted and directly queued to user
    65528973 acknowledgments not containing data received
    198865300 predicted acknowledgments
    124 times recovered from packet loss due to fast retransmit
    47018 times recovered from packet loss due to SACK data
    1175 bad SACKs received
    Detected reordering 1842 times using FACK
    Detected reordering 1579 times using SACK
    Detected reordering 14 times using reno fast retransmit
    Detected reordering 1756 times using time stamp
    2698 congestion windows fully recovered
    22329 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 25842
    11272 congestion windows recovered after partial ack
    40247 TCP data loss events
    TCPLostRetransmit: 3190
    71 timeouts after reno fast retransmit
    13024 timeouts after SACK recovery
    3134 timeouts in loss state
    123997 fast retransmits
    19821 forward retransmits
    61024 retransmits in slow start
    122970 other TCP timeouts
    TCPRenoRecoveryFail: 55
    3400 sack retransmits failed
    35 times receiver scheduled too late for direct processing
    72038 DSACKs sent for old packets
    149 DSACKs sent for out of order packets
    127535 DSACKs received
    146 DSACKs for out of order packets received
    38801 connections reset due to unexpected data
    2113 connections reset due to early user close
    14160 connections aborted due to timeout
    TCPSACKDiscard: 1131
    TCPDSACKIgnoredOld: 981
    TCPDSACKIgnoredNoUndo: 12538
    TCPSpuriousRTOs: 480
    TCPSackShifted: 137435
    TCPSackMerged: 314085
    TCPSackShiftFallback: 254120
    TCPChallengeACK: 3281
    TCPSYNChallenge: 8
    TCPFromZeroWindowAdv: 16084
    TCPToZeroWindowAdv: 16084
    TCPWantZeroWindowAdv: 83130
IpExt:
    InOctets: 624006853592
    OutOctets: 370712171692

And here is my current sysctl:

# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1

# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

# Controls the default maxmimum size of a mesage queue
kernel.msgmnb = 65536

# Controls the maximum size of a message, in bytes
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
Michael Lowman
  • 3,604
  • 20
  • 36

0 Answers0