2

First of all, thank you for reading my question!

I'm currently studying the replication model of Hadoop but I'm at a dead end. I study from the the book "Oreilly Hadoop The Definitive Guide 3rd Edition Jan 2012". To come to the question, I first need to to read the beneath text from the book.

On page 73, there is the following:

"The DistributedFileSystem returns an FSDataOutputStream for the client The Hadoop Distributed Filesystem to start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutput Stream, which handles communication with the datanodes and namenode. As the client writes data (step 3),

DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the Data Streamer, whose responsibility it is to ask the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas."*

As you can see, the DFSOutputStream has a data queue of packets. The data queue is being consumed by the DataStreamer who asks the namenode to allocate new blocks.

My questions:

How does this work?

How does the Namenode allocate new blocks?

Same question, ask differently: How does the Namenode create a list of suitable Datanodes?

I can't find anything about this on the internet or in the book. The book explains the process from a high level.

I really appreciate your time helping me, I thank you!

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
ielkhalloufi
  • 652
  • 1
  • 10
  • 27

3 Answers3

1

It's a pluggable, policy based algorithm. See Replica Placement for more information.

Chris Shain
  • 50,833
  • 6
  • 93
  • 125
  • Thank you for your time answering my question. The reference that you gave me is explaining the default replication factor (3), how does the namenode work with a replication factor of 10 or 100? Can you give me a reference to "pluggable, policy based algorithm". I am 1,5 months old with Hadoop. – ielkhalloufi Jun 12 '12 at 19:30
1

I might see 2 different questions, since you also mentioned DataStreamer.

You can find both answers in "The Hadoop Distributed File System by Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler"

1) How blocks are requested from client (DataStreamer) ?

When a client writes, it first asks the NameNode to choose DataNodes to host replicas of the first block of the file. The client organizes a pipeline from node-to-node and sends the data. When the first block is filled, the client requests new DataNodes to be chosen to host replicas of the next block. A new pipeline is organized, and the client sends the further bytes of the file. Each choice of DataNodes is likely to be different.

2) How does NameNode create a list of suitable datanodes ?

As already answered by other user, Hadoop allows this policy to be configurable, but as default replica placement policy:

When a new block is created, HDFS places the first replica on the node where the writer is located, the second and the third replicas on two different nodes in a different rack, and the rest are placed on random nodes with restrictions that no more than one replica is placed at one node and no more than two replicas are placed in the same rack when the number of replicas is less than twice the number of racks.

vortex.alex
  • 1,105
  • 3
  • 11
  • 24
1

Have a look at Apache HDFS Designenter image description here

For example, when the replication factor is three, HDFS’s placement policy is as follows from grepcode

** The class is responsible for choosing the desired number of targets
 * for placing block replicas.
 * The replica placement strategy is that if the writer is on a datanode,
 * the 1st replica is placed on the local machine, 
 * otherwise a random datanode. The 2nd replica is placed on a datanode
 * that is on a different rack. The 3rd replica is placed on a datanode
 * which is on a different node of the rack as the second replica.
 */
@InterfaceAudience.Private
public class BlockPlacementPolicyDefault extends BlockPlacementPolicy {

This policy cuts the inter-rack write traffic which improves write performance.

The chance of rack failure is far less than that of node failure; This policy does not impact data reliability and availability guarantees.

With this policy, the replicas of a file do not evenly distribute across the racks.

One third of replicas are on one node

two thirds of replicas are on one rack

And the other third are evenly distributed across the remaining racks.

This policy improves write performance without compromising data reliability or read performance.

==>

1st and 3rd replica exists on one RAC and 2nd replica exists on other RAC ( remote)

Community
  • 1
  • 1
Ravindra babu
  • 37,698
  • 11
  • 250
  • 211