Hadoop HDFS does not notice when a block file is manually deleted

Question

I would like to remove a specific raw block file (and included .meta file) from a specific machine (DataNode) in my cluster running HDFS and move it to a another specific machine (DataNode).

It's possible to accomplish this if I stop the HDFS, move the block files manually as such, and restart it. The block shows up in the new location fine. However, I would like to do this without stopping the whole cluster.

I have found that if I stop the two DataNodes in question, move the file, and restart them, the Namenode immediately realizes that the destination DataNode now has the file (note that dfsadmin -triggerBlockReport does not work. The DataNodes must be restarted). However, nothing appears capable of making the HDFS realize the file has been deleted from the source DataNode. The now nonexistent replica shows up as existing, healthy, and valid no matter what I try. This means the HDFS decides that the block is over-replicated, causing it to delete a random replica while one of the existing replicas is actually gone.

Is there any way to force the Namenode to refresh more fully in some way, inform it that the replica has been deleted, make it choose to delete the replica that I myself now know to not exist, or otherwise accomplish this task? Any help would be appreciated.

(I'm aware that the Balancer/DiskBalancer must accomplish this in some way and have looked into it's source, however I found it extremely dense and would like to avoid manually editing Hadoop/HDFS source code if at all possible.)

Edit: Found a solution. If I delete the block file from the source DataNode but not the .meta file, the block report I then trigger informs the Namenode that the replica is missing. I believe that by deleting the .meta file I was making it so that the Namenode never considered changes to that replica on that block on that DataNode (as nothing about it was ever reported).

How long did you wait to see if the DataNode detected the missing file? `dfs.datanode.directoryscan.interval` and `dfs.blockreport.intervalMsec` default to 6 hours. — tk421, Apr 17 '18 at 03:37

Hadoop HDFS does not notice when a block file is manually deleted

0 Answers0