¿How can I debug this problem?
(I've got full tcpdump captures)
I have a TCP server into which many clients establish persistent connections. Normally all these clients behave, and I never reach the 1024 default Linux limit connections (open files) per process.
Yesterday someone (or something) started misbehaving and leaving a lot of open connections, forcing me to restart the server. You can see its behavior on the following munin netstat graph:
Every time connections reach 1000, I restart the server. Only the fourth time the misbehavior stopped as mysteriously as it started, without any apparent reason. Something similar happened one week ago.
All the bad connections come from the same (sub)network: I can isolate them, but there are some valid connections that come from the same network too (so I can't deny connections from that network).
So far I've used tcpdump, ethereal and ngrep, but I haven't found a way to look at connections that are established, but that don't transfer data.
- How should I look the tcpdump (pcap) captures to isolate the misbehaving connections and study them?
- What would you suggest to stop this happening?
Thanks!