3

the problem is that: I have 3 datanodes when I created the cluster, and a few days ago I added another two datanodes.

After I did this, I ran the balancer, and the balancer finished quickly, and said the cluster was balanced.

But I found that once I put data(about 30MB) into the cluster, the datanodes used a lot of bandwidth (about 400Mbps) to send and receive data between the old datanodes and the new ones.

Could someone tell me what's the possible reason ?

Maybe I described the problem not very clear, I'll show you two pics (from zabbix), hadoop-02 is one of the "old datanode", and hadoop-07 is one of the "new datanode". enter image description here enter image description here

zhaozhi
  • 1,491
  • 1
  • 16
  • 19

2 Answers2

2
  1. If you mean network traffic. Hdfs uses write pipeline. Assume the replication factor is 3, the data flow is

    client --> Datanode_1 --> Datanode_2 --> Datanode_3

    If the data size is 30mb, the overall traffic is 90mb plus a little overhead (for connection creation, packet headers, data checksums in packets)

  2. If you mean traffic rate. I believe currently Hdfs doesn't have bandwidth throttling between client <--> DN, and DN <--> DN. It will use as much as bandwidth as it can get.

If you noticed more data flows between the old datanodes and the new ones. It might happens when some blocks are under-replicated before. After you add new nodes, NameNode periodically schedule replication task from old DNs to the other DNs(not necessarily the new ones).

waltersu
  • 1,191
  • 8
  • 20
  • thank you for your reply! I mean the network traffic. And I have fixed the under-replicated blocks problem by following this approach: https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html – zhaozhi May 17 '16 at 08:10
  • The instruction you posted is so wrong. setrep doesn't fix under-replicated blocks. If the file has 3 repls already, you set replFactor=3 will solve the under-replication of course. But if the file has 2 or less repls, and the replFactor is 3 by default, you setting replFactor=3 is an noop. It's still under-replicated. NameNode will slowly schedule the replication task from old DN to new DN. From your posted pics, I'm more sure I'm correct. I'm familiar with HDFS code. The only huge traffic between 2 DNs is either pipeline writing or block replication task. – waltersu May 21 '16 at 05:12
  • continue my previous comment: balancer's copy-and-delete is like block replication. But your balancer finished quickly. So I rule out the possibility because copying only happens when balancer is running, unless your balancer stopped abnormally. I saw the data flow is from hadoop-02 to hadoop-07. You might wanna check the DN logs to see what happens. – waltersu May 21 '16 at 05:32
0

Hold on!! You are saying that the bandwidth is over-utilized during the data transfer OR the DNs were not balanced after putting the data because balancer is used to balance the amount of data present on nodes in the cluster.

Ishan Kumar
  • 1,941
  • 3
  • 20
  • 29