First of all, thank you for reading my question!
I'm currently studying the replication model of Hadoop but I'm at a dead end. I study from the the book "Oreilly Hadoop The Definitive Guide 3rd Edition Jan 2012". To come to the question, I first need to to read the beneath text from the book.
On page 73, there is the following:
"The DistributedFileSystem returns an FSDataOutputStream for the client The Hadoop Distributed Filesystem to start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutput Stream, which handles communication with the datanodes and namenode. As the client writes data (step 3),
DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the Data Streamer, whose responsibility it is to ask the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas."*
As you can see, the DFSOutputStream has a data queue of packets. The data queue is being consumed by the DataStreamer who asks the namenode to allocate new blocks.
My questions:
How does this work?
How does the Namenode allocate new blocks?
Same question, ask differently: How does the Namenode create a list of suitable Datanodes?
I can't find anything about this on the internet or in the book. The book explains the process from a high level.
I really appreciate your time helping me, I thank you!