1

I am trying to understand Namenode and I referred to online material and referring to book Hadoop: The definitive guide as well.

I understand that Namenode has concept like : "edit logs", "fsimage", and I can see the following files in my Namenode.

========================================================================

-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 23 22:53 edits_0000000000000000001-0000000000000000001
-rw-r--r-- 1 root     root     1048576 Nov 23 23:42 edits_0000000000000000002-0000000000000000002
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 00:07 edits_0000000000000000003-0000000000000000003
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 21:03 edits_0000000000000000004-0000000000000000004
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 24 22:59 edits_0000000000000000005-0000000000000000005
-rw-r--r-- 1 root     root     1048576 Nov 24 23:00 edits_0000000000000000006-0000000000000000006
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 25 21:15 edits_0000000000000000007-0000000000000000007
-rw-rw-r-- 1 vevaan24 vevaan24 1048576 Nov 25 21:34 edits_0000000000000000008-0000000000000000008
-rw-r--r-- 1 root     root     1048576 Nov 26 02:13 edits_inprogress_0000000000000000009
-rw-rw-r-- 1 vevaan24 vevaan24     355 Nov 25 21:15 fsimage_0000000000000000006
-rw-rw-r-- 1 vevaan24 vevaan24      62 Nov 25 21:15 fsimage_0000000000000000006.md5
-rw-r--r-- 1 root     root         355 Nov 26 00:12 fsimage_0000000000000000008
-rw-r--r-- 1 root     root          62 Nov 26 00:12 fsimage_0000000000000000008.md5
-rw-r--r-- 1 root     root           2 Nov 26 00:12 seen_txid
-rw-rw-r-- 1 vevaan24 vevaan24     201 Nov 26 00:12 VERSION

In that book it was mentioned that fsimage doesn't store the block locations in it.

I have following questions:

1) Does edit logs store the block locations as well? (for the new transactions?)

2) When Namenode and Datanode are restarted how does Namenode get the block address? My doubt is NN read fsimage to reconstuct the filesystem info, but fsimage doesn't have the info of block location, so how this information is reconstructed?

3) Is it true that fsimage stores BLOCK ID only, and if so, is BLOCK ID unique across Datanodes? Is BLOCK ID same as that of BLOCK address ?

CuriousMind
  • 8,301
  • 22
  • 65
  • 134

1 Answers1

1

Block locations i.e., the datanodes on which the blocks are stored is neither persisted in the fsimage file nor in the edit log. Namenode keeps this mapping only in the memory.

It is the responsibility of each datanode to hold the information of the list of blocks it is storing.

During restart, Namenode loads the fsimage file into memory and apply the edits from the edit log, the missing information of block locations is obtained from the datanodes as they check in with their block lists. Namenode, with the information from block lists, constructs the mapping of blocks with their locations in its memory.

fsimage has more than the Block ID. It holds the information like blocks of the file, block size, replication factor, access time, modification time, file permissions but not the location of the blocks.

Yes, Block IDs are unique. Block address would refer the address of the datanodes in which the block resides.

franklinsijo
  • 17,784
  • 4
  • 45
  • 63
  • Thanks so much for your detailed info. However one thing I am not still able to understand. A Datanode has data in the form of blocks, but how does DN know for which file a given block belongs to? I agree that DN provides block information to NN, but how does this help in locating the block address for a given file on a DN. – CuriousMind Apr 14 '17 at 10:46
  • 1
    Blocks are identified by their block ids and they are unique. Datanodes hold this information in them. – franklinsijo Apr 14 '17 at 10:47
  • So does this mean DN would send the block id to NN, and since fsimage has block id info, it does the mapping based on the block id. If this is correct then it makes sense. And for this work, then block id's need to be unique all the hdfs cluster. Is this correct understanding? – CuriousMind Apr 14 '17 at 10:50
  • 1
    Yes. That is what I have explained in the answer. For every block pool, the block ids are unique. – franklinsijo Apr 14 '17 at 10:53
  • Thanks so much, got the insight of this. Thx a ton! – CuriousMind Apr 14 '17 at 10:55