One of datanode's usage reached 100% in hdfs? Balancer is not working

Question

I have some problems with Hadoop hdfs. (Hadoop 2.7.3) I have 2 namenode (1 active, 1 standby) and 3 datanodes. And replication factor is 3.

$  hdfs dfs -df -h /
Filesystem                Size    Used  Available  Use%
hdfs://hadoop-cluster  131.0 T  51.3 T     79.5 T   39%

Used disk is 51T with -df command.

$ hdfs dfs -du -h /
912.8 G  /dir1
2.9 T    /dir2

But used disk is about 3T with -du command.

I found that one of datanodes reached 100% usage.

Live datanodes (3):

datanode1: 
Configured Capacity: 48003784114176 (43.66 TB)
DFS Used: 2614091989729 (2.38 TB)
Non DFS Used: 95457946911 (88.90 GB)
DFS Remaining: 45294174318384 (41.19 TB)
DFS Used%: 5.45%
DFS Remaining%: 94.36%


*****datanode2******
Configured Capacity: 48003784114176 (43.66 TB)
DFS Used: 48003784114176 (43.66 TB)
Non DFS Used: 0
DFS Remaining: 0
DFS Used%: 100%
DFS Remaining%: 0%


datanode3: 
Configured Capacity: 48003784114176 (43.66 TB)
DFS Used: 2615226250042 (2.38 TB)
Non DFS Used: 87496531142 (81.49 GB)
DFS Remaining: 45301001735984 (41.20 TB)
DFS Used%: 5.45%
DFS Remaining%: 94.37%

My question is

I tried to do balancer. It seems to work but no block has been moved for all iterations and it exists without any error. How can I balance the disk usage of datanodes? Why hdfs balancer command does not move any block?

19/11/06 11:27:51 INFO balancer.Balancer: Decided to move 10 GB bytes from datanode2:DISK to datanode3:DISK
19/11/06 11:27:51 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
19/11/06 11:27:51 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
19/11/06 11:27:51 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
19/11/06 11:27:51 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
19/11/06 11:27:51 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
19/11/06 11:27:51 INFO balancer.Balancer: Will move 10 GB in this iteration
19/11/06 11:27:51 INFO balancer.Dispatcher: Limiting threads per target to the specified max.
19/11/06 11:27:51 INFO balancer.Dispatcher: Allocating 5 threads per target.

No block has been moved for 5 iterations. Exiting...

Although datanode2 is full, the status of the node is shown as "In-service" or "Live" or "Normal". Surely, I can't write new data in hdfs at this situation.
The result of -df and the result of -du is too different. Why?

Why are your datanodes named "namenode"? That's just confusing — OneCricketeer, Nov 06 '19 at 02:18
@cricket_007 Sorry for the confusing. It is typo. They are datanode, not namenode. I fixed. — soy, Nov 06 '19 at 02:58

score 2 · Answer 1 · answered Feb 17 '21 at 07:39

Either add a new data node, or reduce the replication factor.

Why ?

Lets call the most used node in the cluster as alpha, and the remaining two less used nodes as beta, gamma.

Now, lets say you are moving a "file.txt" from alpha node with replication factor 3 to beta node, whats happening here is that the main file is moved to beta node, but in alpha node the replicated file is created. So, the total space used in the alpha node remains constant.

One of datanode's usage reached 100% in hdfs? Balancer is not working

1 Answers1