3

I am writing a simple monitoring script to which I would like to add disk space checks. I found however that the reported free space is different between the system df and shutils.disk_usage().

On a system which has three disks mounted:

# df / /mnt/2TB1 /mnt/1TB1
Filesystem      1K-blocks       Used Available Use% Mounted on
/dev/sda1       472437724  231418380 216997128  52% /
/dev/sdb1      1921802520 1712163440 111947020  94% /mnt/2TB1
/dev/sdc1       960380648  347087300 564438888  39% /mnt/1TB1

# python3
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> (t, u, f) = shutil.disk_usage('/')
>>> (t, u, f)
(483776229376, 236973805568, 222203674624)
>>> u/t
0.48984177224594366
>>> (t, u, f) = shutil.disk_usage('/mnt/2TB1')
>>> (t, u, f)
(1967925780480, 1753255362560, 114633748480)
>>> u/t
0.8909153891628782
>>> (t, u, f) = shutil.disk_usage('/mnt/1TB1')
>>> (t, u, f)
(983429783552, 355400192000, 578002624512)
>>> u/t
0.361388477290517

The difference is respectively 3%, 5% and 3%. Where does it come from and which result is the correct one?

WoJ
  • 27,165
  • 48
  • 180
  • 345
  • Can you post the values of `u` and `t`? Right now we don't know which of these values differs from `df`'s values. – Socowi Jul 01 '19 at 14:49
  • @Socowi: you're right -- I updated the question with the information – WoJ Jul 01 '19 at 15:11

3 Answers3

5

Python appears to have the correct results.
By default, [man7]: DF(1) (man df) displays numbers (sizes) in 1 KiB blocks. But, given the fact that the operation (division by 1024) is applied to both divider and divisor (when computing the percentage), it reduces itself, so it shouldn't have anything to do with the final result.

Example (for a certain dir):

  1. Run df (by default, output in KiB)
  2. Run df -B 1 (output in bytes)
  3. Run the following Python script:

    import sys, shutil
    
    path = sys.argv[1] if len(sys.argv) > 1 else "/"
    t, u, f = shutil.disk_usage(path)
    percent = 100 * u / t
    print("(Python) - Volume name\t{:} {:} {:} {:.3f}% ({:.0f}) {:}".format(t, u, f, percent, percent, path))
    
[cfati@cfati-ubtu16x64-0:~]> for f in "/" "/media/sf_shared_00"; do echo df "${f}" && df ${f} && echo df -B 1 "${f}" && df -B 1 ${f} && echo Python script on "${f}" && python3 -c "import sys, shutil; path = sys.argv[1] if len(sys.argv) > 1 else \"/\"; t, u, f = shutil.disk_usage(path); percent = 100 * u / t; print(\"(Python) - Volume name\t{:} {:} {:} {:.3f}% ({:.0f}) {:}\".format(t, u, f, percent, percent, path))" ${f} && echo && echo; done
df /
Filesystem                                   1K-blocks     Used Available Use% Mounted on
/dev/mapper/ubtu16x640_lvg0-ubtu16x640_root0 102067544 10999896  85859792  12% /
df -B 1 /
Filesystem                                      1B-blocks        Used   Available Use% Mounted on
/dev/mapper/ubtu16x640_lvg0-ubtu16x640_root0 104517165056 11263893504 87920427008  12% /
Python script on /
(Python) - Volume name  104517165056 11263893504 87920427008 10.777% (11) /


df /media/sf_shared_00
Filesystem     1K-blocks      Used Available Use% Mounted on
shared_00      327679996 155279796 172400200  48% /media/sf_shared_00
df -B 1 /media/sf_shared_00
Filesystem        1B-blocks         Used    Available Use% Mounted on
shared_00      335544315904 159006511104 176537804800  48% /media/sf_shared_00
Python script on /media/sf_shared_00
(Python) - Volume name  335544315904 159006511104 176537804800 47.388% (47) /media/sf_shared_00

As seen, the numbers (sizes) from step #2. are identical to the ones from step #3.. Computing the percentage (in any of the 3 cases), the Python percentage seems to be the correct one.

It's unclear to me why df reports those percentages (didn't look in the source code), but it could be (everything that comes is pure speculation):

  • It tends to be user protective (reporting a bit more percentage than actual)
  • It has something to do with logical disk units (sectors).
    For example on a 4 KiB (4096) sector disk, a 4097 bytes file, will occupy (normally 4097 bytes), but given the fact that the disk logical unit is the sector (and not the byte - this is somehow similar to #pragma pack), the file will take 2 sectors (8 KiB), and therefore its underlying size will be greater than the reported one
CristiFati
  • 38,250
  • 9
  • 50
  • 87
5

As ChristiFati already pointed out, the ratios used / total are the same for both tools, but the Use% field reported by df differs from 100 · used / total.

As an example, lets examine the values for /dev/sda1 mounted on /.

df.total = 472437724
df.used = 231418380
df.available = 216997128
df.percentage = 52

shutil.total = 483776229376
shutil.used = 236973805568
shutil.free = 222203674624

df.used / df.total = 0.4898 = shutil.free / shutil.total
but …
df.used / df.total = 0.4898    0.52 = df.percentage / 100

The source code of coreutils' df implementation sheds some light on this issue. The three lines 1171-1173 are relevant. pct is the percentage.

uintmax_t u100 = v->used * 100; uintmax_t nonroot_total = v->used + v->available; pct = u100 / nonroot_total + (u100 % nonroot_total != 0);

As we can see df does not compute used / total but used / (used + free). Note that used + free < total.

I suspected that …

total includes space which is reserved for meta-data like where which file resides in the file system (depending on the file system this can include fat tables, inodes, …). Since you cannot use that space for regular files that space is excluded in the Use% by using (used + free) instead which does not include meta-data.

However, a test revealed that …

this cannot be the complete story. The following script generates a FAT12 and an ext2 file system inside a 2 MiB file. The script has to be executed using sudo.

#! /bin/bash

check() {
  head -c 2MiB /dev/zero > fs
  mkfs."$@" fs
  mkdir fsmount
  mount -o loop fs fsmount
  df fsmount
  umount fsmount
  rm -r fs fsmount
}

echo fat12:
check fat -F 12

echo ext2:
check ext2

I got the output

fat12:
[...]
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/loop0          2028     0      2028   0% /tmp/fsmount
ext2:
[...]                           
Creating filesystem with 2048 1k blocks and 256 inodes
[...]
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/loop0          2011    21      1888   2% /tmp/fsmount

Note that both total sizes are smaller than the file system which is 2048 KiB = 2 MiB in both cases. Both file systems had no files at all, but for ext2 df reported 21 KiB as used (may be related to this question).

Socowi
  • 25,550
  • 3
  • 32
  • 54
0

Once 1Gb was 1024 Megabytes, but manufacturers fucked up those routine after they discovered a marketing trick to call 50000 Megabytes with the name of 50 Gb.

So the difference is in the way how those software implementations deal with those Megas, either as 1000 or 1024.

ipaleka
  • 3,745
  • 2
  • 13
  • 33
  • I don't think this is an issue here. The numbers reported by `df` and `shutil` are in byte; no `k/ki`, `M/Mi`, `G/Gi`, ... just plain bytes. And even if the units were different between `df` and `shutil`: The percentage shouldn't change. – Socowi Jul 01 '19 at 14:58
  • I have to correct myself. OP's output of `df` lists sizes in units of 1024 bytes. `shutil` uses bytes as units. However, the percentage should be the same anyway. – Socowi Jul 01 '19 at 19:19