HDFS FileSplit locations

Question

I have a cluster with an installation of hadoop-2.1.0-beta. Is there a way to learn where each filesplit is located in my cluster? What I am looking for is a list such as the following

filesplit_0001 node1
filesplit_0002 node4
...

edit: I know that such a list is available in Microsoft Azure.

GS Majumder · Accepted Answer · 2013-09-18T05:53:05.177

The fsck tool provides an easy way to find out which blocks are in any particular file. For example:

% hadoop fsck <path> -files -blocks -locations -racks

Reference : Hadoop Command Line Guide.

Edit:

An input split is a chunk of the input that is processed by a single map. Each map processes a single split. Each split is divided into records, and the map processes each record a key-value pair in turn. Splits and records are logical but HDFS blocks are physical.

An InputSplit has a length in bytes and a set of storage locations, which are just hostname strings. A split doesn’t contain the input data; it is just a reference to the data.

You can get InputSplit instance in map method.

InputSplit inputSplit=context.getInputSplit(); //Input split instance 
String[] splitLocations = inputSplit.getLocations();

thanks @mgs, this is a good answer in case block size equals filesplit size. But in my case the two have different sizes. — polerto, Sep 18 '13 at 05:01

HDFS FileSplit locations

1 Answers1