8

I'm new to map-reduce framework. I want to find out the number of files under a specific directory by providing the name of that directory. e.g. Suppose we have 3 directories A, B, C and each one is having 20, 30, 40 part-r files respectively. So I'm interested in writing a hadoop job, which will count files/records in each directory i.e I want an output in below formatted .txt file:

A is having 20 records

B is having 30 records

C is having 40 records

These all directories are present in HDFS.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Prasanna
  • 1,752
  • 1
  • 15
  • 27

1 Answers1

6

The simplest/native approach is to use built in hdfs commands, in this case -count:

hdfs dfs -count /path/to/your/dir  >> output.txt

Or if you prefer a mixed approach via Linux commands:

hadoop fs -ls /path/to/your/dir/*  | wc -l >> output.txt

Finally the MapReduce version has already been answered here:

How do I count the number of files in HDFS from an MR job?

Code:

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}
System.out.println("The count is: " + count);
Petro
  • 3,484
  • 3
  • 32
  • 59