0

Suppose I have a file of 50mb and my default HDFS block size is 64mb. So after storing this 50mb file, we are left with 14mb block size which can be used to store some other file. Now namenode keeps track of the block informations of the files present in HDFS. So in this case there will be 2 records pointing to the same block, one for the 50mb file and one for the 14mb file.

My question is how is the namenode keeps track of these 2 records and provide metadata info about the files as the 2 files will be pointing to the same block.

2 Answers2

1

You are wrong with the assumption of having several files per block. One block can store only one file, but one file can be stored on multiple blocks (in case it's size is bigger then the size of the block). So the Namenode will only map one file at most per block.

Note that the disk space will be only used for actual file size and not for entire block size, that is why having many small files might have impact on the memory of the Namenode.

Serhiy
  • 4,073
  • 3
  • 36
  • 66
0

First thing file is not stored according to block size it is stored according to input splits. File is logically divided into smaller part called input splits then these are saved in block.

Second yes there can be more then 1 file per block which simply means there can be a case where more then 1 input split is present in a block. These splits have something called EOL ( end of line ) which helps application master to keep track of them which in turn update name node.

For more details refer to https://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/ This will clear all your doubt's.

siddhartha jain
  • 1,006
  • 10
  • 16