1

I would like to list files using hadoop command. "hadoop fs -ls filepath". I want to write a Java code to achieve this. Can I write a small piece of java code, make a jar of it and supply it to Map reduce job(Amazon EMR) to achieve this ? Can you please point me to the code and steps using which I can achieve this ?

2 Answers2

1

You can list files in HDFS using JAVA code as below

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

...

Configuration configuration = new Configuration(); 

FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:54310"), configuration);
FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://localhost:54310/user/path"));

Path[] paths = FileUtil.stat2Paths(fileStatus);

for (Path path : paths) {
  System.out.println(path);
}

Use this in your map reduce trigger code ( main or run method) for get the list and pass it args for your map reduce class

Option 2

  1. create shell script to read list of files using hadoop fs -ls command
  2. provide this script as part of EMR bootstrap script to get list of files
  3. in same script you can write code to save the paths in text files under path /mnt/
  4. read this path from your map reduce code and provide to arg list for your mapper and reducers
Tom
  • 16,842
  • 17
  • 45
  • 54
Sandesh Deshmane
  • 2,247
  • 1
  • 22
  • 25
1

Here is My Github Repository

Simple Commands like:

making folder,
putting files to hdfs, reading,
listing and
writing data are present in JAVA API folder.

And you can explore other folders to get map-reduce codes in java.

Aadish Goel
  • 451
  • 1
  • 4
  • 12