5

My goal is to:

  1. reading a file from S3,
  2. changing its metadata
  3. Push it out to S3 again

AWS java SDK doesn't allow outputstreams to be pushed. Therefore, I have to convert the outputstream from step2 to inputstream. For this I decided to use PipedInputStream.

However, my code just hangs in the writeTo(out); step. This code is in a grails application. When the code hangs the CPU is not in high consumption:

import org.apache.commons.imaging.formats.jpeg.xmp.JpegXmpRewriter;

AmazonS3Client client = nfile.getS3Client() //get S3 client
S3Object object1 = client.getObject(
                  new GetObjectRequest("test-bucket", "myfile.jpg")) //get the object. 

InputStream isNew1 = object1.getObjectContent(); //create input stream
ByteArrayOutputStream os = new ByteArrayOutputStream();
PipedInputStream inpipe = new PipedInputStream();
final PipedOutputStream out = new PipedOutputStream(inpipe);

try {
   String xmpXml = "<x:xmpmeta>" +
    "\n<Lifeshare>" +
    "\n\t<Date>"+"some date"+"</Date>" +
    "\n</Lifeshare>" +
    "\n</x:xmpmeta>";/
   JpegXmpRewriter rewriter = new JpegXmpRewriter();
   rewriter.updateXmpXml(isNew1,os, xmpXml); //This is step2

   try {
new Thread(new Runnable() {
    public void run () {
        try {
            // write the original OutputStream to the PipedOutputStream
            println "starting writeto"
            os.writeTo(out);
            println "ending writeto"
        } catch (IOException e) {
            // logging and exception handling should go here
        }
    }
}).start();

         ObjectMetadata metadata = new ObjectMetadata();
         metadata.setContentLength(1024); //just testing
         client.putObject(new PutObjectRequest("test-bucket", "myfile_copy.jpg", inpipe, metadata));
         os.writeTo(out);

         os.close();
         out.close();
   } catch (IOException e) {
         // logging and exception handling should go here
   }

}
finally {
   isNew1.close()
   os.close()
   out.close()
}

The above code just prints starting writeto and hangs. it does not print ending writeto

Update By putting the writeTo in a separate thread, the file is now being written to S3, however, only 1024 bytes of it being are written. The file is incomplete. How can I write everything from outputstream to S3?

Omnipresent
  • 29,434
  • 47
  • 142
  • 186
  • You write to the PipedOutputStream: os.writeTo(out), but do you read from the PipedInputStream which is connected to it ? – zeppelin Oct 26 '16 at 19:24
  • Yes, I do. I did not show that code since the program was hanging before that code but for completeness I've added it now. – Omnipresent Oct 26 '16 at 19:40

2 Answers2

5

When you do os.writeTo(out), it will try to flush an entire stream to out, and since there is nobody reading from the other side of it (i.e. inpipe) yet, the internal buffer fills up and the thread stops.

You have to setup the reader before you write the data, and also make sure that it is executed in a separate thread (see javadoc on PipedOutputStream).

zeppelin
  • 8,947
  • 2
  • 24
  • 30
  • I moved the code for consuming (`S3object`) above the `writeTo` but the result is the same... – Omnipresent Oct 26 '16 at 19:53
  • Please try using the TransferManager instead of AmazonS3Client, to make the upload in a separate thread (that is important as you can not read and write to the PipedOutputStream in the same thread) – zeppelin Oct 26 '16 at 19:55
  • Basically your client.putObject() will block trying to read from inpipe (because no data have been writter to it yet). – zeppelin Oct 26 '16 at 19:57
  • Ok I put the `writetTo` in a different thread and after that I use `PutObjectRequest`. Now the file is being written to S3, however, its size is only 1024. So it is incomplete. Please see the updated code. – Omnipresent Oct 26 '16 at 20:09
  • How can I write the entire file to s3? – Omnipresent Oct 26 '16 at 20:12
  • Do you really need this line? metadata.setContentLength(1024); //just testing – zeppelin Oct 26 '16 at 20:13
  • Without that line I get an error `Message: Write end dead ->> 311 | read in java.io.PipedInputStream - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | 378 | read in '' | 72 | read . . . . . . . . in com.amazonaws.internal.SdkFilterInputStream` Perhaps the number needs to change from 1024 to something else but not sure where I get that number from.. – Omnipresent Oct 26 '16 at 20:16
  • Changing that number to `2048` writes a file to S3 with size `2048`. So I think what I need is the actual length of the outputstream. – Omnipresent Oct 26 '16 at 20:19
  • Please remove metadata.setContentLength(1024) (you do not need it), and move out.close() to inside your writing Thread. What most probably happens is that it terminates before the putObject() is still working, hence the "dead end" exception. – zeppelin Oct 26 '16 at 20:22
  • I changed it to `metadata.setContentLength(out.size())` which worked fine. Thanks a ton for your help – Omnipresent Oct 26 '16 at 20:27
  • this answer would benefit from code outlining the idea, rather than theory – bharal Aug 07 '18 at 17:36
1

Per Bharal's request, I just worked through this issue myself thanks to the comments above. So adding that sample code. Hope it helps someone out!

public void doSomething() throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    baos.write("some bytes to stick in the stream".getBytes());

    InputStream inStr = toInputStream(baos);
}

public InputStream toInputStream(ByteArrayOutputStream orgOutStream) throws IOException{
    PipedInputStream in = new PipedInputStream();
    PipedOutputStream out = new PipedOutputStream();

    try{
        new Thread(((Runnable)() -> {
            try{
                orgOutStream.writeTo(out);
                out.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        })).start();
    } finally{
        orgOutStream.close();
    }
    return in;
}

The real trick is to ensure the Piped calls are done in a separate thread.

lees2bytes
  • 296
  • 2
  • 9