I would like to list files using hadoop command. "hadoop fs -ls filepath". I want to write a Java code to achieve this. Can I write a small piece of java code, make a jar of it and supply it to Map reduce job(Amazon EMR) to achieve this ? Can you please point me to the code and steps using which I can achieve this ?
Asked
Active
Viewed 1,816 times
2 Answers
1
You can list files in HDFS using JAVA code as below
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
...
Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:54310"), configuration);
FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://localhost:54310/user/path"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for (Path path : paths) {
System.out.println(path);
}
Use this in your map reduce trigger code ( main or run method) for get the list and pass it args for your map reduce class
Option 2
- create shell script to read list of files using hadoop fs -ls command
- provide this script as part of EMR bootstrap script to get list of files
- in same script you can write code to save the paths in text files under path /mnt/
- read this path from your map reduce code and provide to arg list for your mapper and reducers

Tom
- 16,842
- 17
- 45
- 54

Sandesh Deshmane
- 2,247
- 1
- 22
- 25
1
Simple Commands like:
making folder,
putting files to hdfs,
reading,
listing and
writing data are present in JAVA API folder.
And you can explore other folders to get map-reduce codes in java.

Aadish Goel
- 451
- 1
- 4
- 12