I have FreeBSD router:
#uname
9.1-STABLE FreeBSD 9.1-STABLE #0: Fri Jan 18 16:20:47 YEKT 2013
It's a powerful computer with a lot of memory
#top -S
last pid: 45076; load averages: 1.54, 1.46, 1.29 up 0+21:13:28 19:23:46
84 processes: 2 running, 81 sleeping, 1 waiting
CPU: 3.1% user, 0.0% nice, 32.1% system, 5.3% interrupt, 59.5% idle
Mem: 390M Active, 1441M Inact, 785M Wired, 799M Buf, 5008M Free
Swap: 8192M Total, 8192M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 4 155 ki31 0K 64K RUN 3 71.4H 254.83% idle
13 root 4 -16 - 0K 64K sleep 0 101:52 103.03% ng_queue
0 root 14 -92 0 0K 224K - 2 229:44 16.55% kernel
12 root 17 -84 - 0K 272K WAIT 0 213:32 15.67% intr
40228 root 1 22 0 51060K 25084K select 0 20:27 1.66% snmpd
15052 root 1 52 0 104M 22204K select 2 4:36 0.98% mpd5
19 root 1 16 - 0K 16K syncer 1 0:48 0.20% syncer
Its tasks are: NAT via ng_nat and PPPoE server via mpd5.
Traffic through - about 300Mbit/s, about 40kpps at peak. Pppoe sessions created - 350 max.
ng_nat is configured by by the script:
/usr/sbin/ngctl -f- <<-EOF
mkpeer ipfw: nat %s out
name ipfw:%s %s
connect ipfw: %s: %s in
msg %s: setaliasaddr 1.1.%s
There are 20 such ng_nat nodes, with about 150 clients.
Sometimes, the traffic via nat stops. When this happens vmstat reports a lot of FAIL counts
vmstat -z | grep -i netgraph
ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
NetGraph items: 72, 10266, 1, 376,39178965, 0, 0
NetGraph data items: 72, 10266, 9, 10257,2327948820,2131611,4033
I was tried increase
net.graph.maxdata=10240
net.graph.maxalloc=10240
but this doesn't work.
It's a new problem (1-2 week). The configuration had been working well for about 5 months and no configuration changes were made leading up to the problems starting.
In the last few weeks we have slightly increased traffic (from 270 to 300 mbits) and little more pppoe sessions (300->350).
Help me please, how to find and solve my problem?
Upd: Info about network cards:
# pciconf -lv | grep -B3 network
em0@pci0:0:25:0: class=0x020000 card=0x35788086 chip=0x15028086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
device = '82579LM Gigabit Network Connection'
class = network
--
em1@pci0:2:0:0: class=0x020000 card=0x35788086 chip=0x10d38086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class = network
UPD: There is 2 "top" output https://gist.github.com/korjavin/9190181
when I swith net.isr.dispatch to hybrid. After this, I have tons of mpd processes (don't know why) and one CPU to 100% of interrupt, and after 10 minutes of work it was rebooted, due to big packet lost.
UPD: Happened again There is "top" output before reboot and after https://gist.github.com/korjavin/9254734
looks like problem in ng_queue proccess, which eating CPU to much. Since my first post, there much more sessions and traffics. About 400 pppoe , and 450Mbit/s