I have an akka-http service and I am trying out the alpakka s3 connector for uploading files. Previously I was using a temporary file and then uploading with Amazon SDK. This approach required some adjustments for Amazon SDK to make it more scala like, but it could handle even a 1000 requests at once. Throughput wasn't amazing, but all of the requests went through eventually. Here is the code before changes, with no alpakka:
```
path("uploadfile") {
withRequestTimeout(20.seconds) {
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
val uploadFuture = upload(file, file.toPath.getFileName.toString)
onComplete(uploadFuture) {
case Success(_) => complete(StatusCodes.OK)
case Failure(_) => complete(StatusCodes.FailedDependency)
}
}
}
}
}
case class S3UploaderException(msg: String) extends Exception(msg)
def upload(file: File, key: String): Future[String] = {
val s3Client = AmazonS3ClientBuilder.standard()
.withCredentials(new DefaultAWSCredentialsProviderChain())
.withRegion(Regions.EU_WEST_3)
.build()
val promise = Promise[String]()
val listener = new ProgressListener() {
override def progressChanged(progressEvent: ProgressEvent): Unit = {
(progressEvent.getEventType: @unchecked) match {
case ProgressEventType.TRANSFER_FAILED_EVENT => promise.failure(S3UploaderException(s"Uploading a file with a key: $key"))
case ProgressEventType.TRANSFER_COMPLETED_EVENT |
ProgressEventType.TRANSFER_CANCELED_EVENT => promise.success(key)
}
}
}
val request = new PutObjectRequest("S3_BUCKET", key, file)
request.setGeneralProgressListener(listener)
s3Client.putObject(request)
promise.future
}
```
When I changed this to use alpakka connector, the code looks much nicer as we can just connect the ByteSource
and alpakka Sink
together. However this approach cannot handle such a big load. When I execute 1000 requests at once (10 kb files) less than 10% go through and the rest fails with exception:
akka.stream.alpakka.s3.impl.FailedUpload: Exceeded configured max-open-requests value of [32]. This means that the request queue of this pool (HostConnectionPoolSetup(bargain-test.s3-eu-west-3.amazonaws.com,443,ConnectionPoolSetup(ConnectionPoolSettings(4,0,5,32,1,30 seconds,ClientConnectionSettings(Some(User-Agent: akka-http/10.1.3),10 seconds,1 minute,512,None,WebSocketSettings(,ping,Duration.Inf,akka.http.impl.settings.WebSocketSettingsImpl$$$Lambda$4787/1279590204@4d809f4c),List(),ParserSettings(2048,16,64,64,8192,64,8388608,256,1048576,Strict,RFC6265,true,Set(),Full,Error,Map(If-Range -> 0, If-Modified-Since -> 0, If-Unmodified-Since -> 0, default -> 12, Content-MD5 -> 0, Date -> 0, If-Match -> 0, If-None-Match -> 0, User-Agent -> 32),false,true,akka.util.ConstantFun$$$Lambda$4534/1539966798@69c23cd4,akka.util.ConstantFun$$$Lambda$4534/1539966798@69c23cd4,akka.util.ConstantFun$$$Lambda$4535/297570074@6b426c59),None,TCPTransport),New,1 second),akka.http.scaladsl.HttpsConnectionContext@7e0f3726,akka.event.MarkerLoggingAdapter@74f3a78b))) has completely filled up because the pool currently does not process requests fast enough to handle the incoming request load. Please retry the request later. See http://doc.akka.io/docs/akka-http/current/scala/http/client-side/pool-overflow.html for more information.
Here is how the summary of a Gatling test looks like:
---- Response Time Distribution ---------------------------------------- t < 800 ms 0 ( 0%)
800 ms < t < 1200 ms 0 ( 0%)
t > 1200 ms 90 ( 9%)
failed 910 ( 91%)
When I execute 100 of simultaneous requests, half of it fails. So, still close to satisfying.
This is a new code: ```
path("uploadfile") {
withRequestTimeout(20.seconds) {
extractRequestContext { ctx =>
implicit val materializer = ctx.materializer
extractActorSystem { actorSystem =>
fileUpload("csv") {
case (metadata, byteSource) =>
val uploadFuture = byteSource.runWith(S3Uploader.sink("s3FileKey")(actorSystem, materializer))
onComplete(uploadFuture) {
case Success(_) => complete(StatusCodes.OK)
case Failure(_) => complete(StatusCodes.FailedDependency)
}
}
}
}
}
}
def sink(s3Key: String)(implicit as: ActorSystem, m: Materializer) = {
val regionProvider = new AwsRegionProvider {
def getRegion: String = Regions.EU_WEST_3.getName
}
val settings = new S3Settings(MemoryBufferType, None, new DefaultAWSCredentialsProviderChain(), regionProvider, false, None, ListBucketVersion2)
val s3Client = new S3Client(settings)(as, m)
s3Client.multipartUpload("S3_BUCKET", s3Key)
}
```
The complete code with both endpoints can be seen here
I have a couple of questions.
1) Is this a feature? Is this what we can call a backpressure?
2) If I would like this code to behave like the old approach with a temporary file (no failed requests and all of them finish at some point) what do I have to do? I was trying to implement a queue for the stream (link to the source below), but this made no difference. The code can be seen here.
(* DISCLAIMER * I am still a scala newbie trying to quickly understand akka streams and find some workaround for the issue. There are big chances that there is something simple wrong in this code. * DISCLAIMER *)