0

I had to create a random file of 10GB size, which I can using dd or fallocate, but the size shown by du -sh is twice the one I created:

$ dd bs=1MB count=10000 if=/dev/zero of=foo
10000+0 records in
10000+0 records out
10000000000 bytes (10 GB, 9.3 GiB) copied, 4.78419 s, 2.1 GB/s
$ du -sh foo
19G     foo
$ ls -sh foo 
19G foo
$ fallocate -l 10G bar
$ du -sh bar
20G     bar
$ ls -sh bar
20G bar

Can someone please explain me this apparent discrepancy?

Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • Already answered here https://stackoverflow.com/a/23793037/3833426 – John Dec 15 '22 at 13:53
  • Actually, in your example `du` and `ls` agree on the file size. – user1934428 Dec 15 '22 at 15:21
  • 1
    Does this answer your question? [Size() vs ls -la vs du -h which one is correct size?](https://stackoverflow.com/questions/23789031/size-vs-ls-la-vs-du-h-which-one-is-correct-size) – TylerH Dec 15 '22 at 16:45
  • Not quite. I was trying to create a file of 10GB using dd or fallocate, but as per `du -sh` or `ls -sh` I got a file of 20GB. – Tarun Gupta Dec 16 '22 at 05:31
  • In the above case I am not sure whether the file is of 10GB or 20GB. – Tarun Gupta Dec 16 '22 at 05:32
  • what version of linux are using ? what is the filesystem used in the folder of foo ? – EchoMike444 Dec 17 '22 at 05:21
  • I am using rhel8 and the filesystem is GPFS in the folder of foo. On a different note, I have also noticed that if I copy this file to a NFS fs folder, `du -sh` and `ls -sh` shows 0 size, but `stat` shows correct size of 10GB. How is filesystem affecting the shown size? – Tarun Gupta Dec 17 '22 at 05:44
  • Also `stat` is always showing correct size of 10GB irrespective of the filesystem. – Tarun Gupta Dec 17 '22 at 05:46

1 Answers1

0

On wikipedia, it mentions about GPFS ...

The system stores data on standard block storage volumes, but includes an internal RAID layer that can virtualize those volumes for redundancy and parallel access much like a RAID block storage system.

I conclude that there is at least one non-visible duplicate for every file, and therefore each file actually uses twice the amount of space than the actual content of a single file. So the underlying RAID imposes the double-usage.

That could explain it, because I have created a similar massive file for other purposes, also using dd, on an ext4 filesystem, but the OS reports my file size matching the dd creation size, as per design intent (no RAID in effect on my drive).

The fact that you indicate that stat does report the correct file size as per dd's actions, confirms what I put forward above.

Eric Marceau
  • 1,601
  • 1
  • 8
  • 11