From where the namenode gets information of the datanode

Question

While saving a file on the HDFS, it will split the file and store accordingly and stores the information on the edit log and it's all fine.

My question is: when I request the read operation to the namenode, from where it will look the datanode details?
From fsimage or the edit log?
If it is looking from the fsimage, a new fsimage will be generated at one hour interval.
If I'd request it before that time interval, what would happen?

New fsimage will generate with incremental data and old data will remain saved in this new one also. — Manish Pansari, Jan 07 '18 at 16:25
Thanks for your comment.... But my doubt is if i stored the new large file at 8:30 and this information will be stored in the edit log but not in the fsimage and i requested for read that same file to namenode in 8:31 that is before the checkpointing process..... Now from where the namenode will get the information from fsimage or editlog — , Jan 08 '18 at 03:45
Blocks related metastore are stored in fsimage and editlog is used for capturing the logs that is the summary of all the activity related to hdfs file system — Manish Pansari, Jan 08 '18 at 04:59

score 0 · Accepted Answer · answered Jan 11 '18 at 18:17

Let's break down where each bit of information about the filesystem is stored on the NameNode.

The filesystem namespace (hierarchy of directories and files) is stored entirely in memory on the NameNode. There is no on-disk caching. Everything is in memory at all times. The FsImage is used only for persistence in case of failure. It is read only on startup. The EditLog stores changes to the FsImage; again, the EditLog is read only on startup. The active NameNode will never read an FsImage or EditLog during normal operation. However, a BackupNode or Standby NameNode (depending on your configuration) will periodically combine new EditLog entries with an old FsImage to produce a new FsImage. This is done to make startup more rapid and to reduce the size of on-disk data structures (if no compaction was done, the size of the EditLog would grow indefinitely).

The namespace discussed above includes the mapping from a file to the blocks contained within that file. This information is persisted in the FsImage / EditLog. However, the location of those blocks is not persisted into the FsImage. This information lives only transiently in the memory of the NameNode. On startup, the location of the blocks is reconstructed using the blocks reports received from all of the DataNodes. Each DataNode essentially tells the NameNode, "I have block ID AAA, BBB, CCC, ..." and so on, and the NameNode uses these reports to construct the location of all blocks.

To answer your question simply, when you request a read operation from the NameNode, all information is read from memory. Disk I/O is only performed on a write operation, to persist the change to the EditLog.

Primary Source: HDFS Architecture Guide; also I am a contributor to HDFS core code.

From where the namenode gets information of the datanode

1 Answers1