4

Is there any way in which I can write to a file from my Java jar to an S3 folder where my reduce files would be written ? I have tried something like:

    FileSystem fs = FileSystem.get(conf);
    FSDataOutputStream FS = fs.create(new Path("S3 folder output path"+"//Result.txt"));        

    PrintWriter writer  = new PrintWriter(FS);
    writer.write(averageDelay.toString());
    writer.close();
    FS.close();

Here Result.txt is the new file which I would want to write.

Rico
  • 58,485
  • 12
  • 111
  • 141
hitrix
  • 133
  • 3
  • 11
  • btw, why not use [DistributedCache](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html)? It is as portable as the approach you're doing but perhaps more useful for long-running jobs – aldrinleal Feb 14 '14 at 03:48

3 Answers3

0

Answering my own question:-

I found my mistake.I should be passing the URI of S3 folder path to the fileSystem Object like below:-

 FileSystem fileSystem = FileSystem.get(URI.create(otherArgs[1]),conf);
 FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(otherArgs[1]+"//Result.txt"));      
 PrintWriter writer  = new PrintWriter(fsDataOutputStream);
 writer.write("\n Average Delay:"+averageDelay);
 writer.close();
 fsDataOutputStream.close();  
hitrix
  • 133
  • 3
  • 11
  • 8
    what is `conf` in your code? What is `otherArgs[1]`? This code is not helpful to someone finding the question later – Daniel Kats Nov 03 '16 at 01:14
0
FileSystem fileSystem = FileSystem.get(URI.create(otherArgs[1]),new JobConf(<Your_Class_Name_here>.class));
FSDataOutputStream fsDataOutputStream = fileSystem.create(new     
Path(otherArgs[1]+"//Result.txt"));      
PrintWriter writer  = new PrintWriter(fsDataOutputStream);
writer.write("\n Average Delay:"+averageDelay);
writer.close();
fsDataOutputStream.close(); 

This is how I handled the conf variable in the above code block and it worked like charm.

Vinodh Thiagarajan
  • 758
  • 3
  • 9
  • 19
0

Here's another way to do it in Java by using the AWS S3 putObject directly with a string buffer.

... AmazonS3 s3Client;

public void reduce(Text key, java.lang.Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws Exception {

    UUID fileUUID = UUID.randomUUID();
    SimpleDateFormat sdf = new SimpleDateFormat("yyy-MM-dd");
    sdf.setTimeZone(TimeZone.getTimeZone("UTC"));

    String fileName = String.format("nightly-dump/%s/%s-%s",sdf.format(new Date()), key, fileUUID);
    log.info("Filename = [{}]", fileName);

    String content = "";
    int count = 0;
    for (Text value : values) {
        count++;
        String s3Line = value.toString();
        content += s3Line + "\n";
    }
    log.info("Count = {}, S3Lines = \n{}", count, content);


    PutObjectResult putObjectResult = s3Client.putObject(S3_BUCKETNAME, fileName, content);
    log.info("Put versionId = {}", putObjectResult.getVersionId());

    reduceWriteContext("1", "1");

    context.setStatus("COMPLETED");
}
dvallejo
  • 1,033
  • 11
  • 25