0

I'm running a script to find out the disk usage on a wide range range of VPS (size ranging from less than 1 GB of usage to over 200 GB of usage)... I'm trying to maintain performance (EG: no extra load on the VPS) while maintaining accuracy.

df is fast and doesn't produce any disk load that I'm aware of, but it isn't very accurate (I've had it report 0.54 GB of used disk when there was 6+ GB of usage)...

du -s is fast enough on the smaller systems that it doesn't make an impact on performance (it's done before it matters), it's accurate and it works well, but if you run it on a larger system it uses a ton of I/O and can slow down the entire machine.

So I'd like some suggestions on maintaining performance while still getting accurate results.

This script runs every 10 seconds while I'm viewing the status... The data doesn't necessarily need to be 100% accurate on the first pull, but by the 3rd pull it should be. (If that makes sense...)

Bravo Delta
  • 140
  • 1
  • 9
  • I don't think it's true that `df` is *inaccurate*; it just reports differently than `du` (in other words, two different ways of reporting on roughly the same data). Anyhow, theoretically the `du` I/O should take less time after the first round, because some of the data will hopefully be cached. – fission Oct 15 '13 at 05:49

1 Answers1

1

The df utility is reporting free disk space. When it reports differently to du it is normally because a file has been deleted (so du doesn't see it) but the file is still being held open by a process, so it's disk blocks are in use. It could be argued then that df is reporting more accurately than du because while the file is open it's blocks cannot be used by anything else.

You could try using ionice to reduce the io priority of your disk scanning and nice will reduce the cpu priority.

Scanning every 10 seconds is probably way too frequent btw.

user9517
  • 115,471
  • 20
  • 215
  • 297