1

I want to know what the two outputs of hadoop fs -du means. It's not clear on the documentation:

In [16]: subprocess.call(["hadoop", "fs", "-du","-
h","/project/crm/warehouse/"])

Output:

5.9 G 17.8 G /project/crm/warehouse/n98770_patron_1

What's the real size of the path? 5.9 GB or 17.8?

Thank you

SCouto
  • 7,808
  • 5
  • 32
  • 49
Carmen Pérez Carrillo
  • 1,019
  • 2
  • 12
  • 15

1 Answers1

2

The first column is the actual file or directory size, while the second one is the real space consumed due to replication

Since HDFS replicates your data, the second field is showing how much total disk space takes up after it.

In this case your total size is 17.8 and the basic size is 5.9

17.8/5.9 is roughly 3

This means your hdfs cluster has a replication factor of 3 (is the default value).

If your replication factor were 2, then the output will be:

5.9 G 12 G /project/crm/warehouse/n98770_patron_1

SCouto
  • 7,808
  • 5
  • 32
  • 49