So, imagine that I have a Scala Vert.x Web REST API that receives file uploads via HTTP multipart requests. However, it doesn't receive the incoming file data as a single InputStream
. Instead, each file is received as a series of byte buffers handed over via a few callback functions.
The callbacks basically look like this:
// the callback that receives byte buffers (chunks) of the file being uploaded
// it is called multiple times until the full file has been received
upload.handler { buffer =>
// send chunk to backend
}
// the callback that gets called after the full file has been uploaded
// (i.e. after all chunks have been received)
upload.endHandler { _ =>
// do something after the file has been uploaded
}
// callback called if an exception is raised while receiving the file
upload.exceptionHandler { e =>
// do something to handle the exception
}
Now, I'd like to use these callbacks to save the file into a MinIO Bucket (MinIO, if you're unfamiliar, is basically self-hosted S3 and it's API is pretty much the same as the S3 Java API).
Since I don't have a file handle, I need to use putObject()
to put an InputStream
into MinIO.
The inefficient work-around I'm currently using with the MinIO Java API looks like this:
// this is all inside the context of handling a HTTP request
val out = new PipedOutputStream()
val in = new PipedInputStream()
var size = 0
in.connect(out)
upload.handler { buffer =>
s.write(buffer.getBytes)
size += buffer.length()
}
upload.endHandler { _ =>
minioClient.putObject(
PutObjectArgs.builder()
.bucket("my-bucket")
.object("my-filename")
.stream(in, size, 50000000)
.build())
}
Obviously, this isn't optimal. Since I'm using a simple java.io
stream here, the entire file ends up getting loaded into memory.
I don't want to save the File to disk on the server before putting it into object storage. I'd like to put it straight into my object storage.
How could I accomplish this using the S3 API and a series of byte buffers given to me via the upload.handler
callback?
EDIT
I should add that I am using MinIO because I cannot use a commercially-hosted cloud solution, like S3. However, as mentioned on MinIO's website, I can use Amazon's S3 Java SDK while using MinIO as my storage solution.
I attempted to follow this guide on Amazon's website for uploading objects to S3 in chunks.
That solution I attempted looks like this:
context.request.uploadHandler { upload =>
println(s"Filename: ${upload.filename()}")
val partETags = new util.ArrayList[PartETag]
val initRequest = new InitiateMultipartUploadRequest("docs", "my-filekey")
val initResponse = s3Client.initiateMultipartUpload(initRequest)
upload.handler { buffer =>
println("uploading part", buffer.length())
try {
val request = new UploadPartRequest()
.withBucketName("docs")
.withKey("my-filekey")
.withPartSize(buffer.length())
.withUploadId(initResponse.getUploadId)
.withInputStream(new ByteArrayInputStream(buffer.getBytes()))
val uploadResult = s3Client.uploadPart(request)
partETags.add(uploadResult.getPartETag)
} catch {
case e: Exception => println("Exception raised: ", e)
}
}
// this gets called for EACH uploaded file sequentially
upload.endHandler { _ =>
// upload successful
println("done uploading")
try {
val compRequest = new CompleteMultipartUploadRequest("docs", "my-filekey", initResponse.getUploadId, partETags)
s3Client.completeMultipartUpload(compRequest)
} catch {
case e: Exception => println("Exception raised: ", e)
}
context.response.setStatusCode(200).end("Uploaded")
}
upload.exceptionHandler { e =>
// handle the exception
println("exception thrown", e)
}
}
}
This works for files that are small (my test small file was 11 bytes), but not for large files.
In the case of large files, the processes inside the upload.handler
get progressively slower as the file continues to upload. Also, upload.endHandler
is never called, and the file somehow continues uploading after 100% of the file has been uploaded.
However, as soon as I comment out the s3Client.uploadPart(request)
portion inside upload.handler
and the s3Client.completeMultipartUpload
parts inside upload.endHandler
(basically throwing away the file instead of saving it to object storage), the file upload progresses as normal and terminates correctly.