Load average increasing along with firewall throughput

Question

I've just pushed a large change to my backend code and I've noticed a massive increase in the load average in the few hours since the push. I looked at Munin for what the problem might be and I noticed that, along with the load average, the firewall throughput had increased hugely too:

Firewall Throughput

This is along with increases in CPU usage, interrupts and load average, which I've added here for completeness:

CPU

Interrupts

Load average

Does anyone know what could be going on here? My immediate thought was that the changes to the code put more load on the database (PostgreSQL) but I can't find a reason for the increase in firewall throughput. The traffic has stayed the same, the only difference here is the Python code running under Gunicorn. In htop the highest-CPU process changes between Gunicorn and Postgres, just as it did before (suggesting that Postgres hasn't suddenly become a CPU-hog).

EDIT: This is the output from iptables -L -n -v:

Chain INPUT (policy ACCEPT 298K packets, 357M bytes)
 pkts bytes target     prot opt in     out     source               destination
 7705  516K fail2ban-ssh  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 22

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 296K packets, 372M bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain fail2ban-ssh (1 references)
 pkts bytes target     prot opt in     out     source               destination
   17  1720 REJECT     all  --  *      *       58.218.201.19        0.0.0.0/0            reject-with icmp-port-unreachable
   16  1228 REJECT     all  --  *      *       210.45.250.3         0.0.0.0/0            reject-with icmp-port-unreachable
 7583  505K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

UPDATE: I rebooted the whole server and the load average climbed back up to around 7 so I guess this means I can rule out issues with the cache having old data after the changes to the DB schema.

In my experince, merely filtering packets isn't expensive - even with hundreds of lines in the rulesets. Are you perhaps logging these as well? If you have suddenly started logging hundreds more packets per second, that will *definitely* load your system. — MadHatter, Apr 22 '15 at 13:24
My server software (nginx) logs requests but iptables isn't logging anything. The traffic to the site hasn't increased beyond the norm so nginx will be logging at the same rate as it always has before this increase in server load. — benwad, Apr 22 '15 at 13:28
Sorry to belabour the point, but you're *sure* that `iptables` is logging nothing? Any chance you could add the ruleset (`iptables -L -n -v`) to the question, just to be sure? — MadHatter, Apr 22 '15 at 13:29
Thanks, I'm not particularly experienced with iptables so I can't say with 100% certainty that it isn't logging anything. I've added the ruleset to the question. — benwad, Apr 22 '15 at 13:37
Thanks. I agree with you that it's not logging, so that's not it; my apologies for the wrong line of investigation. — MadHatter, Apr 22 '15 at 15:02

score 1 · Answer 1 · answered Apr 22 '15 at 14:15

1

The name of the munin plugin is a bit unfortunate, because it doesn't really measure anything directly related to firewalling; it shows how many packets are received by the system on any interface, and how many packets are forwarded through the system. Hence it doesn't matter how many firewall rules (if any!) you have. It examines the file /proc/net/snmp and monitors the 3rd and 6th field of the "Ip:" line.

Are you talking to your postgreSQL server via tcp/ip, or via a unix domain socket? If via tcp/ip, perhaps queries are being executed twice due to some bug in your changes. Otherwise you will have to research further where those extra incoming packets are coming from.

answered Apr 22 '15 at 14:15

wurtel

3,864
12
15

Everything is running off one server, so in theory the only traffic in/out of the server would be normal user requests. – benwad Apr 22 '15 at 14:16
localhost tcp/ip traffic will still count as incoming traffic, which is why I asked how you're talking to the SQL server. You should add plugins for each interface to determine the traffic per interface, e.g. `ln -s /usr/share/munin/plugins/if_ /etc/munin/plugins/if_eth0` (repeat for `lo` and any other active interface). – wurtel Apr 22 '15 at 14:19
Ah, my apologies. I just checked and we are using TCP/IP to connect to the postgreSQL server. I'll add those links and see what I find. – benwad Apr 22 '15 at 14:21
I added the symlink for `lo` (the others were already there). Is there anything I have to do before Munin starts generating reports for it? – benwad Apr 22 '15 at 16:00
Restart munin-node. – wurtel Apr 23 '15 at 07:04

Load average increasing along with firewall throughput

1 Answers1