2

I am a bit confused by the Hadoop architecture.

  1. What kind of file metadata is stored in Hadoop Namenode? From Hadoop wiki, it says Namenode stores the entire system namespace. Does information like last modified time, created time, file size, owner, permissions and etc stored in Namenode?

  2. Does datanode store any metadata information?

  3. There is only one Namenode, can the metadata data exceed the server's limit?

  4. If a user wants to download a file from Hadoop, does he have to download it from the Namenode? I found the below architecure picture from web, it shows a client can direct write data to datanode? Is it true? enter image description here

Thanks!!!!!!!

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
leon
  • 10,085
  • 19
  • 60
  • 77
  • please check the below details by using the secondary name node http://www.mplsvpn.info/2012/11/hadoop-file-system-metadata-replication.html regards shivlu jain –  Nov 15 '12 at 18:01

6 Answers6

6

I think the following explanation can help you to better understand the HDFS architecture. You can consider Name node to be like FAT (file allocation table) + Directory data and Data nodes to be dumb block devices. When you want to read the file from the regular file system, you should go to Directory, then go to FAT, get locations of all relevant blocks and read them. The same happens with HDFS. When you want to read the file, you go to the Namenode, get the list blocks the given file have. This information about blocks will contain list of datanodes where this information sitting. After it you go to the datanode and get relevant blocks from them.

David Gruzman
  • 7,900
  • 1
  • 28
  • 30
2
  1. The fsimage on the name node is in a binary format. Use the "Offline Image Viewer" to dump the fsimage in a human-readable format. The output of this tool can be further analyzed with pig or some other tool to get more meaningful data.

http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_imageviewer.html

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • The above link has expired. Here is a link to the image viewer for HDFS 2.7.5: http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html – Parth Shah Feb 08 '18 at 00:24
1
  1. yes
  2. no, apart from the blocks themselves
  3. yes, if you have many small files
  4. no, the info about the file is on the Namenode, the file itself is on Datanodes (a datanode could in theory be on the same machine, and often is on smaller clusters)
johndodo
  • 17,247
  • 15
  • 96
  • 113
1

3) When the no.of files are so huge , a single Namenode will not be able to keep all the metadata . In fact that is one of the limitations of HDFS . You can check HDFS Federation which aims to address this problem by splitting into different namespaces served by different namenodes .

4)

Read process :    
a) Client first gets the datanodes where the actual data is located from the namenode 
b) Then it directly contacts the datanodes to read the data

Write process : 
a) Client asks namenode for some datanodes to write the data and if available Namenode gives them 
b)Client goes directly to the datanodes and write
hari_sree
  • 1,508
  • 3
  • 12
  • 24
0

For question number 4. Client does write data directly to Datanode. However, before it can write to a DataNode, it needs to talk to the Namenode to obtain metatdata like which Datanode and which block to write to.

Jing Wang
  • 50
  • 3
0
  1. Yes, NameNode manages these. Also frequently this data will be saved in fsimage and edits files which will be on local disk.

  2. No, all the metadata will be maintained by NameNode. Because of which the datanode burden will be less to maintain the metadata.

  3. There will be only one primary NameNode. As I said to manage the limit of metadata size, the data will be frequently saved in fsimage and edits through checkpointing.

  4. Client can contact the DataNode once it gets the file information from NameNode.

Nandakishore
  • 981
  • 1
  • 9
  • 22