-1

I'm running a rails server with docker on EC2, it has 64G volume. The web sevice crashed yesterday, I logged in the server and kept getting message about running out of the disk space. I used df -h and du -sh /*/, the result was like:

enter image description here

enter image description here

I deleted some logs to free about 3G space, but it's full again in around 30 mins. I executed the command du -sh /*/ again, I got the result as below.

enter image description here

I couldn't see where space is increasing, the /var folder decreasing 3G was the only thing being changed.

Any tips would be appreciated.

david0116
  • 103
  • 2
  • 11
  • 2
    Probably a better question for [Unix.SE] as it is more system administration than programming. – Nate Eldredge Dec 13 '21 at 03:33
  • Go to /var and do `sudo du -sh` to see which directory is the biggest and start investigating from there. What's the baseline - as in, how big do you expect /var to be? You can also put a cron job that'll just do `cd /var && sudo du -sh > /tmp/var-size.txt` every 5 minutes. That can tell you which directory is increasing in size fast. `lsof` will show you list of files open along with the process ID. `ps aux` will tell you which process related to the PID is associated with fast growing files. – zedfoxus Dec 13 '21 at 03:44
  • @NateEldredge Alright, done it. Thank you. – david0116 Dec 13 '21 at 03:51
  • @zedfoxus As attached images, nothing is growing....I have no idea why my space is keeping decreaed. – david0116 Dec 13 '21 at 03:53
  • 1
    @david0116 can you share the full output of `df -h` command? Can you add the output of `mount`? The output of `df -h` shows NVMe disk, which is typically attached to EC2. You're unlikely to run your OS off of that. It's typical to run the OS off of EBS volume. The output of `du -sh` is from your OS and that doesn't add up to 62G. – zedfoxus Dec 13 '21 at 04:03
  • 1
    Also check `lsof -n | grep -i deleted`. If there're large files deleted, they may free up disk when the associated process restarts. – zedfoxus Dec 13 '21 at 04:14
  • 1
    @zedfoxus Alright, I've added `df -h` and `mount` infomation, thank you. – david0116 Dec 13 '21 at 05:49
  • 1
    @zedfoxus `lsof -n | grep -i deleted` seems to work. It released 42G additional space. I killed ruby production.log proscess. – david0116 Dec 13 '21 at 05:50
  • But the usage is still increasing, I'll keep watching for it and report here. Thank you for the help! – david0116 Dec 13 '21 at 05:53
  • @david0116 excellent. I will add it as an answer so someone else with the same issue can find it. You are welcome to wait for some more answers before deciding whether an answer should be marked as accepted. – zedfoxus Dec 13 '21 at 05:56

1 Answers1

1

One of the issues could be that processes may be removing some large files but the files may still be on disk, and would be removed when the process gets a SIGHUP or the process is restarted.

You can find such files by doing:

lsof -n | grep -i deleted

This will show you a list of deleted files and the process. You can restart that process to free up the disk or you can send a SIGHUP signal to the process.

To see what's taking up disk space, you will have to keep a watch on a few things. You can create a cron job that runs every 5 minutes (or every 10 minutes or 30 minutes, you choose) that does:

date >> /tmp/deleted-files.txt && lsof -n | grep -i deleted >> /tmp/deleted-files.txt

Analyze the file and see if files are being created and deleted chronically.

If you have identified the directory that keeps on growing, you can also create a cron job that runs every few minutes to save the file listing in a temporary file like so

date >> /tmp/file-list.txt && ls -ltrh >> /tmp/file-list.txt

That way you can watch the files that are being generated and review their contents. It is possible that someone may be logging in debug mode.

If you are using Ruby on Rails (RoR), Ruby on Rails production log rotation thread can help you set up log rotation. You can be aggressive about log rotation to get a handle on the disk size.

One thing I can tell you is that if you attach an EBS volume worth 200 GB, the cost will be ~$200 for the year and you will have to spend less time urgently on the issue. If your time savings generates more revenue than $200 in the year, getting an EBS volume and storing logs on that would be a much cheaper proposition in the long run.

zedfoxus
  • 35,121
  • 5
  • 64
  • 63
  • yes the lsof -n | grep -i deleted command gave me deleted files list..... i could get back my space by rebboting ec2 instance... thanks – Jose Kj Mar 13 '23 at 17:25