-2

My S3 directory is

/sssssss/xxxxxx/rrrrrr/xx/file1
/sssssss/xxxxxx/rrrrrr/xx/file2
/sssssss/xxxxxx/rrrrrr/xx/file3
/sssssss/xxxxxx/rrrrrr/yy/file4
/sssssss/xxxxxx/rrrrrr/yy/file5
/sssssss/xxxxxx/rrrrrr/yy/file6

How my mapreduce program to read these files on S3?

Bill Bell
  • 21,021
  • 5
  • 43
  • 58
llxlf
  • 3
  • 1

2 Answers2

0

For one input path you do the following:

FileInputFormat.addInputPath(job, new Path("/sssssss/xxxxxx/rrrrrr/xx/"));

For two input paths, you do the following:

FileInputFormat.addInputPath(job, new Path("/sssssss/xxxxxx/rrrrrr/xx/"));
FileInputFormat.addInputPath(job, new Path("/sssssss/xxxxxx/rrrrrr/yy/"));

or use addInputPaths(). See the documentation of FileInputPath (depending on your version of Hadoop) for more details.

vefthym
  • 7,422
  • 6
  • 32
  • 58
0

It can be simplified by the following way :-

FileInputFormat.setInputDirRecursive(job, true);
FileInputFormat.addInputPaths(conf, args[0]);

You just need to give the base path of the s3 dir and not the exact location of each and every file. It will go to the last dir which contains file.

Deepan Ram
  • 842
  • 1
  • 10
  • 25