How to debug inode usage

Question

I am a programmer for a popular website hosted by two web servers with apache. We're in the ~1000 alexa rank range to give you an idea. I'm not a server administrator.

Only one of them (always the same one) has been having a problem that started for the first time last week: every 2-3 days, it crawls to a complete halt timing out all HTTP connections and SSH connections as well. If you are already SSH'd in typing commands is either extremely slow (20-30 minutes before they appear) or doesn't work at all.

The only way to temporarily fix the problem that we found is rebooting the server.

I noticed this in our graphs:

enter image description here

Every peak corresponds to the server halting, and every drop corresponds to the reboot we were forced to do.

How can I further debug this? Is there a way to see what process is using the most inodes? What would you do?

Do any of the other monitors show anything peculiar? I don't find it likely that reaching 60% inode usage would bring your system to a screeching halt. Maybe the climb in inode usage is just a symptom of something else that might be going wrong. — Safado, Nov 21 '11 at 15:41
@Ryan: I went over all our graphs, I found another "abnormal" one but it's inode related as well (I edited it in). — Andreas Bonini, Nov 21 '11 at 15:47
possible duplicate of [Determine Location of Inode Usage](http://serverfault.com/questions/38907/determine-location-of-inode-usage) — quanta, Nov 21 '11 at 15:48
@Ryan: something else I just found is that the memory allocated for "slab_cache" is considerably higher on www01 (the problematic server) than www02: 4 GB vs 1.5 GB. The servers and their software is pretty much identical. — Andreas Bonini, Nov 21 '11 at 15:50
And if anyone else can answer this: Why would the inode usage drop after a reboot? Aren't inodes associated with persistent data that's written to disk? — Safado, Nov 21 '11 at 15:50
Ryan, if it's dropping after a reboot, that may indicate some process releasing, say, open file handles upon termination. I would look at process tables, run "lsof" to see if there's any abnormal usage, etc. If this is one out of several identical boxes, I would be suspicious, as there might be an attacker. — cjc, Nov 21 '11 at 16:11
Does your software perchance write a lot of data to `/tmp`? Is `/tmp` perchance not its own filesystem (taking up space on `/` until reboot, when all the data in `/tmp` should get dumped)? --- Also if you're not the server administrator perhaps you should have them come here and post more details. — voretaq7, Nov 21 '11 at 16:18

score 1 · Answer 1 · answered Nov 21 '11 at 21:55

The graphs you've embedded are showing the usage of in-memory inodes, not the one present on disk. The increase is very likely due to the fact that the number of open handles (files) on this system is increasing too. Maybe one of your processes is leaking handles, check the "lsof" output to verify that.

How to debug inode usage

1 Answers1