How to move data block from datanode to other datanode during mapreduce?

Question

I implemented 4 node cluster for running hadoop following the site(https://www.linode.com/docs/guides/how-to-install-and-set-up-hadoop-cluster/)

By the way, I want to move data block to other datanode after map task or during map or reduce task.

Is there any method? and then After data blocks are moved, can it complete map reduce job?

I try to use scp and moving data block to other data node.

However, I just use yarn file and mapreduce-example jar(hadoop 3.1.2) file.

I don't know how to modify the codes

Also, after datablock is moved, namenode automatically change metadata the block?

score 0 · Answer 1 · answered May 25 '23 at 11:51

0

You can't just use scp, or similar methods.

The namenode tracks individual block locations, and manually moving block files around will cause corruption of unreplicated files. For any replicated files, HDFS may replicate it back, anyway, if you did move one block. That metadata will not be automatically updated.

Stick to hadoop fs -mv commands, which you can use Filesystem Java API, too. Mapreduce is not required

answered May 25 '23 at 11:51

OneCricketeer

179,855
19
132
245

After the hdfs block is divided and copied and saved to the ssd, I physically mount the ssd where the hdfs block is saved to the ssd of another server and try to mount it. If I do that, it cannot change the metadata to other sever which mount SSD. Can I change the information of meta data directly in namenode? I try to find meta data information. It include fsimage and editlog in namenode directory. – arthur seokwon choi May 29 '23 at 18:16
The namenode data is binary, I believe. I've personally never tried to edit it. Remounting disks won't change anything – OneCricketeer May 30 '23 at 11:04
is there any method? After remouning SSD, and get change meta data in namenode. Binary code change or other method..... – arthur seokwon choi Jun 23 '23 at 09:15
I don't know what you mean by "get change meta data in namenode". This isn't a problem that "code change" can solve since that would mean re-compiling Hadoop source code, and nothing to do with a running HDFS system – OneCricketeer Jun 23 '23 at 19:21
If hadoop map task finsh, then I detach ssd and attach to other node. Then, it keep running reduce task. Name node can know whether the SSD moved to other node? The data in the SSD can be used during reduce task? I think name node can recognize that data.. After detach SSD and attach SSD to other node, reduce task can be executed sucessfully? – arthur seokwon choi Jun 25 '23 at 19:45
No. Device hardware ids aren't tracked, only IP/hostname of datanodes are used in block information – OneCricketeer Jun 26 '23 at 14:52

How to move data block from datanode to other datanode during mapreduce?

1 Answers1