-1

I'm pretty new to Hadoop environment. Recently, I run a basic mapreduce program. It was easy to run.

Now, I've a input file with following contents inside input path directory

fileName1
fileName2
fileName3
...

I need to read the lines of this file one by one and create a new File with those names (i.e fileName1, fileName2, and so on) at specified output directory.

I wrote the below map implementation, but it didn't work out

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            String fileName = value.toString();
            String path = outputFilePath + File.separator + fileName;
            File newFile = new File(path);

            newFile.mkdirs();
            newFile.createNewFile();
        }

Can somebody explain me what I've missed out ?

Thanks

2 Answers2

0

I think you should get started with studying the FileSystem class, I think you can only create files in the distributed filesystem. Here's a code example of where I opened a file for reading, you probably just need a FSDataOutputStream. In your mapper you can get your configuration out of the Context class.

    Configuration conf = job.getConfiguration();
    Path inFile = new Path(file);
    try {
        FileSystem fs;
        fs = FileSystem.get(conf);

        if (!fs.exists(inFile))
            System.out.println("Unable to open settings file: "+file);

        FSDataInputStream in = fs.open(inFile);
                    ...
    }
DDW
  • 1,975
  • 2
  • 13
  • 26
0

First of all get the path of the input directory inside your mapper with the help of FileSplit. Then append it to the name of the file which contains all these lines and read the lines of this file using FSDataInputStream. Something like this :

public void map(Object key, Text value, Context context)
                    throws IOException, InterruptedException {

        FileSplit fileSplit = (FileSplit)context.getInputSplit();
        FileSystem fs = FileSystem.get(context.getConfiguration());
        FSDataInputStream in = fs.open(new Path(fileSplit.getPath().getParent() + "/file.txt"));
        while(in.available() > 0){
                    FSDataOutputStream out = fs.create(new Path(in.readLine()));
        }
       //Proceed further....
}
Tariq
  • 34,076
  • 8
  • 57
  • 79