0

I have a cluster with an installation of hadoop-2.1.0-beta. Is there a way to learn where each filesplit is located in my cluster? What I am looking for is a list such as the following

filesplit_0001 node1
filesplit_0002 node4
...

edit: I know that such a list is available in Microsoft Azure.

polerto
  • 1,750
  • 5
  • 29
  • 50

1 Answers1

1

The fsck tool provides an easy way to find out which blocks are in any particular file. For example:

% hadoop fsck <path> -files -blocks -locations -racks

Reference : Hadoop Command Line Guide.

Edit:

An input split is a chunk of the input that is processed by a single map. Each map processes a single split. Each split is divided into records, and the map processes each record a key-value pair in turn. Splits and records are logical but HDFS blocks are physical.

An InputSplit has a length in bytes and a set of storage locations, which are just hostname strings. A split doesn’t contain the input data; it is just a reference to the data.

You can get InputSplit instance in map method.

InputSplit inputSplit=context.getInputSplit(); //Input split instance 
String[] splitLocations = inputSplit.getLocations();
GS Majumder
  • 999
  • 6
  • 8
  • thanks @mgs, this is a good answer in case block size equals filesplit size. But in my case the two have different sizes. – polerto Sep 18 '13 at 05:01