0

From "White, Tom; Hadoop: The Definite Guide; Ch. 3, The Hadoop Distributed Filesystem, Anatomy of a File Write":

...The DataStreamer streams the packets to the first datanode in the pipeline, which stores each packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the...

It doesn't metion how does the datanode know which one is the next datanode where it has to send the packets.

U880D
  • 8,601
  • 6
  • 24
  • 40
Dipperman
  • 119
  • 1
  • 12

1 Answers1

2

The Namenode knows all the datanode and rack placements. Datanodes don't know about one another.

The client contacts the Namenode first during a write, then datanode addresses are sent for replica writes to occur

Related question - Hadoop Replication Model - DataStreamer/Namenode

Regarding which addresses are used,

HDFS’s placement policy is to put one replica on the local machine if the writer is on a datanode, otherwise on a random datanode in the same rack as that of the writer, another replica on a node in a different (remote) rack, and the last on a different node in the same remote rack

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Replica_Placement:_The_First_Baby_Steps

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • 1
    as it is a pipeline how does the first datanode knows which datanodes is the next one where he has to send the data. – Dipperman Sep 18 '19 at 10:39
  • As mentioned, the Namenode returns a list of addresses using the logic I quoted. And the datanode doesn't know. The client writing the data does and just forwards that data along – OneCricketeer Sep 18 '19 at 10:48
  • the client sends the data just to the first datanode. Then the datanodes are the ones who send the data each other. How does the datanodes know where they have to send the data if the list has been sent just to the client? – Dipperman Sep 18 '19 at 11:06
  • 1
    As you've copied from your source, all necessary information is *forwarded* in the write request. I'm not sure what more you're looking for. – OneCricketeer Sep 18 '19 at 11:08
  • so is the client who send the list of addressed-datanodes to the first datanode? – Dipperman Sep 18 '19 at 11:09
  • That's my interpretation of the documentation, yes. I do not have proof of this – OneCricketeer Sep 18 '19 at 13:00