How to find the size of a HDFS file? What command should be used to find the size of any file in HDFS.
7 Answers
I also find myself using hadoop fs -dus <path>
a great deal. For example, if a directory on HDFS named "/user/frylock/input" contains 100 files and you need the total size for all of those files you could run:
hadoop fs -dus /user/frylock/input
and you would get back the total size (in bytes) of all of the files in the "/user/frylock/input" directory.
Also, keep in mind that HDFS stores data redundantly so the actual physical storage used up by a file might be 3x or more than what is reported by hadoop fs -ls
and hadoop fs -dus
.

- 2,006
- 17
- 10
-
Additionally to the last point - the replication factor is the number shown after the permissions flags, and before the owner (2nd column in @adhunavkulkarni's answer) – Chris White Jul 20 '12 at 10:39
-
4hadoop fs -du -s
for newer versions – sbaker Nov 30 '13 at 16:51 -
24Use `hadoop fs -du -s -h /user/frylock/input` for a much more readable output. – axiom Dec 11 '15 at 23:23
-
1@axiom it returns `10.5 G 31.6 G /path` what are these 2 sizes – Dulanga Heshan Jan 13 '22 at 05:14
-
1@DulangaHeshan - 10.5G is the actual (raw size of the file) and 31.6 G is the space it consumes on disk including replication which in this case is 3 ( 10.5 * 3) = ~31.6 G – Abhishek J Jan 16 '23 at 14:08
You can use hadoop fs -ls
command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.
For e.g. command hadoop fs -ls input
gives following output:
Found 1 items
-rw-r--r-- 1 hduser supergroup 45956 2012-07-19 20:57 /user/hduser/input/sou
The size of file sou
is 45956 bytes.

- 2,284
- 4
- 36
- 54
-
1How would you output the size in the human readable form? -ls - lah doesn't work here – Ivan Bilan Nov 07 '17 at 13:21
-
-
1@ivan_bilan - hadoop fs -ls -h works. Multiple options have to be specified separately, i.e. hadoop fs -ls -R -h for specifying recursive – Mortz Jun 04 '21 at 14:00
I used the below function which helped me to get the file size.
public class GetflStatus
{
public long getflSize(String args) throws IOException, FileNotFoundException
{
Configuration config = new Configuration();
Path path = new Path(args);
FileSystem hdfs = path.getFileSystem(config);
ContentSummary cSummary = hdfs.getContentSummary(path);
long length = cSummary.getLength();
return length;
}
}

- 18,620
- 8
- 71
- 89

- 171
- 1
- 7
-
Can you please tell me if this returns 7906 then what is the size of that directory? Is it in bytes or in kbs? – sandipchandanshive Jan 27 '16 at 15:54
-
See the command below with awk script to see the size (in GB) of filtered output in HDFS:
hadoop fs -du -s /data/ClientDataNew/**A*** | awk '{s+=$1} END {printf "%.3fGB\n", s/1000000000}'
output ---> 2.089GB
hadoop fs -du -s /data/ClientDataNew/**B*** | awk '{s+=$1} END {printf "%.3fG\n", s/1000000000}'
output ---> 1.724GB
hadoop fs -du -s /data/ClientDataNew/**C*** | awk '{s+=$1} END {printf "%.3fG\n", s/1000000000}'
output ---> 0.986GB

- 46,058
- 19
- 106
- 116

- 147
- 1
- 1
hdfs dfs -du -s -h /directory
This is the human readable version, otherwise it will give in bad units (slight bigger)

- 1,974
- 24
- 19
If you want to do it through the API, you can use 'getFileStatus()' method.

- 34,076
- 8
- 57
- 79
-
It's not right it doesn't return file size it return allocated block size which won't be zero for empty files.The default is 67108864. – user1613360 Nov 30 '14 at 06:23
In case if you want to know the each files size inside the directory then use the '*' asterisk at the end.
hadoop fs -du -s -h /tmp/output/*
I hope this helps your purpose.

- 2,135
- 26
- 27