HDFS disk usage showing different information

Question

I got below details through hadoop fsck / Total size: 41514639144544 B (Total open files size: 581 B) Total dirs: 40524 Total files: 124348 Total symlinks: 0 (Files currently being written: 7) Total blocks (validated): 340802 (avg. block size 121814540 B) (Total open file blocks (not validated): 7) Minimally replicated blocks: 340802 (100.0 %)

I am usign 256MB block size. so 340802 blocks * 256 MB = 83.2TB * 3(replicas) =249.6 TB but in cloudera manager it shows 110 TB disk used. how is it possible?

score 0 · Answer 1 · answered Jan 15 '16 at 05:40

0

You cannot just multiply with block size and replication factor. Block size and replication factor can be changed dynamically at each file level.

Hence the computation done in 2nd part of your question need not be correct, especially fsck command is showing block size approximately 120MB.

In this case 40 TB storage is taking up around 110 TB of storage. So replication factor is also not 3 for all the files. What ever you get in Cloudera Manager is correct value.

answered Jan 15 '16 at 05:40

Durga Viswanath Gadiraju

3,896
2
14
21

So hdfs reduces the block size for smaller files. I mean if a file is 8kb and block size is 256Mb then the hdfs will reduce the block size for that file is that what you meant? – Naveen Jan 15 '16 at 05:54
HDFS does not reduce block size. Block size is the maximum size at which files are splitted into blocks. If it allocate the block size then there will be lot of wastage of storage. You can go to namenode web interface, and then utilities, then browse file system. You can see each individual block id and its properties. – Durga Viswanath Gadiraju Jan 16 '16 at 03:13

HDFS disk usage showing different information

1 Answers1