0

This must be a glitch in the Matrix. Right? Let me explain.

I have some automation that converts a qcow2 image into a raw image before uploading the resulting raw image to an S3 bucket (in AWS). Prior to Friday, this automation was specifically written for RHEL qcow2 images, but due to new requirements, the automation has been adapted to handle BIGIP qcow2 images.

Until today, because of the initial limited scope of the automation, the automation didn't have a mechanism to calculate the virtual size of the raw qcow2 image (e.g., using qemu-img info example.qcow2). That aside, when the raw image was produced, a simple ls -lth revealed that it was 81GB:

-rw-r----- 1 d staff 81G Aug 7 01:52 f5-bigip-17.0u4.x86_64.raw

Now, if written on a filesystem that had adequate storage space to accommodate a file of this size, I wouldn't've thought anything of it. However, before this file was created, there was only ~25GB of space available on the filesystem:

[13662: 2022-0807 at 01:32:43: d@mac0 ~/Downloads]
$ df -h /
Filesystem     Size   Used  Avail Capacity iused               ifree %iused  Mounted on
/dev/disk1s1  932Gi  903Gi   25Gi    98% 5934135 9223372036848841672    0%   /

Now what lines-up, is that after the raw image was created, and despite what ls was showing the size of the raw image to be, df showed the filesystem now has ~6GB less:

[13664: 2022-0807 at 01:51:40: d@mac0 ~/Downloads]
$ df -h /
Filesystem     Size   Used  Avail Capacity iused               ifree %iused  Mounted on
/dev/disk1s1  932Gi  909Gi   19Gi    98% 5934638 9223372036848841169    0%   /

... which consequently is what the disk size is for the source qcow2 image:

# qemu-img info f5-bigip-17.0u4.x86_64.qcow2
image: f5-bigip-17.0u4.x86_64.qcow2
file format: qcow2
virtual size: 81G (86973087744 bytes)
disk size: 5.8G
cluster_size: 65536
Format specific information:
    compat: 0.10

Now up to this point, I could reasonably convince myself of how such a discrepancy could exist. The only problem is, since the automation started uploading the raw image to the S3 bucket, (at the time of this posting) it has uploaded more than 55GB of the image:

Upload progress after 1 hour

Some technical specs about the systems on which this is executing:

Host System: Macbook Pro (/ is on an encrypted APFS volume)

Guest System: RHEL 7.4 (via VMware Fusion v10)

The qcow2 and raw images exist on the host system in the Downloads directory, which is shared with/accessible from the guest system.

Unless there's some type of compression and/or dedup going on the host system's filesystem where these files live, how is it possible there is an 81GB file written to a filesystem that only had ~25GB available, and from which, more than 55GB has already been copied up to an S3 bucket?

Thanks to all for your help answering this question.

~D


Additional Info:

Going back over some of the output in my terminal session on the host machine (i.e., my MBP), I noticed that when I offloaded two other raw images (one RHEL 7.9 image and one RHEL 8.4 image), each 10GB (according to ls), I only gained ~11GB back, as seen via df:

[13658: 2022-0807 at 01:24:50: d@mac0 ~/Downloads]
$ df -h /
Filesystem     Size   Used  Avail Capacity iused               ifree %iused  Mounted on
/dev/disk1s1  932Gi  914Gi   14Gi    99% 5933944 9223372036848841863    0%   /
[13659: 2022-0807 at 01:26:44: d@mac0 ~/Downloads]
$ ls -ltr *.raw
-rw-------  1 d  staff  10737418240 Aug  9  2021 el-server-8.4.x86_64.raw
-rw-------  1 d  staff  10737418240 Jan 20  2022 el-server-7.9u3.x86_64.raw
[13661: 2022-0807 at 01:26:51: d@mac0 ~/Downloads]
$ time mv *.raw /Volumes/Elements/home/Downloads/

real    4m54.121s
user    0m5.227s
sys 0m38.865s
[13662: 2022-0807 at 01:32:43: d@mac0 ~/Downloads]
$ df -h /
Filesystem     Size   Used  Avail Capacity iused               ifree %iused  Mounted on
/dev/disk1s1  932Gi  903Gi   25Gi    98% 5934135 9223372036848841672    0%   /
dmas174
  • 3
  • 2

1 Answers1

2

Unless you selected an option to preallocate all the disk space to a VM when creating it, the image file created is a sparse file. Sparse files do not contain any data in places where there would be only zeros, they just contain some indicators that there should be a particular number of zero blocks at this place. So the actual size of a sparse file may be much less than the size indicated in directory.

Here is a very short article about sparse files: https://www.lisenet.com/2014/so-what-is-the-size-of-that-file/

raj
  • 542
  • 2
  • 8
  • Thank you for your response! I haven't done any testing to validate the observed behavior is due to the raw images being sparse files, but it completely makes sense and lines-up. Even after being in the field, on the command line, for just over 17 years now, it's the first time hearing about sparse files. Then again, if not for the seemingly blatant discrepancy between the available storage and reported size of the raw image, I would've continued-on still without ever learning about 'em. I'm going to preemptively mark this as the correct answer. Thanks again ... cheers! – dmas174 Aug 09 '22 at 12:09