Alpakka S3 connector stream won't handle the load, throwing akka.stream.BufferOverflowException

Question

I have an akka-http service and I am trying out the alpakka s3 connector for uploading files. Previously I was using a temporary file and then uploading with Amazon SDK. This approach required some adjustments for Amazon SDK to make it more scala like, but it could handle even a 1000 requests at once. Throughput wasn't amazing, but all of the requests went through eventually. Here is the code before changes, with no alpakka:

```

path("uploadfile") {
    withRequestTimeout(20.seconds) {
        storeUploadedFile("csv", tempDestination) {
            case (metadata, file) =>
                val uploadFuture = upload(file, file.toPath.getFileName.toString)

                onComplete(uploadFuture) {
                    case Success(_) => complete(StatusCodes.OK)
                    case Failure(_) => complete(StatusCodes.FailedDependency)
                }
            }
        }
    }
}

case class S3UploaderException(msg: String) extends Exception(msg)

def upload(file: File, key: String): Future[String] = {
    val s3Client = AmazonS3ClientBuilder.standard()
        .withCredentials(new DefaultAWSCredentialsProviderChain())
        .withRegion(Regions.EU_WEST_3)
        .build()

    val promise = Promise[String]()

    val listener = new ProgressListener() {
        override def progressChanged(progressEvent: ProgressEvent): Unit = {
            (progressEvent.getEventType: @unchecked) match {
                case ProgressEventType.TRANSFER_FAILED_EVENT => promise.failure(S3UploaderException(s"Uploading a file with a key: $key"))
                case ProgressEventType.TRANSFER_COMPLETED_EVENT |
                     ProgressEventType.TRANSFER_CANCELED_EVENT => promise.success(key)
            }
        }
    }

    val request = new PutObjectRequest("S3_BUCKET", key, file)
    request.setGeneralProgressListener(listener)

    s3Client.putObject(request)

    promise.future
}

```

When I changed this to use alpakka connector, the code looks much nicer as we can just connect the ByteSource and alpakka Sink together. However this approach cannot handle such a big load. When I execute 1000 requests at once (10 kb files) less than 10% go through and the rest fails with exception:

akka.stream.alpakka.s3.impl.FailedUpload: Exceeded configured max-open-requests value of [32]. This means that the request queue of this pool (HostConnectionPoolSetup(bargain-test.s3-eu-west-3.amazonaws.com,443,ConnectionPoolSetup(ConnectionPoolSettings(4,0,5,32,1,30 seconds,ClientConnectionSettings(Some(User-Agent: akka-http/10.1.3),10 seconds,1 minute,512,None,WebSocketSettings(,ping,Duration.Inf,akka.http.impl.settings.WebSocketSettingsImpl$$$Lambda$4787/1279590204@4d809f4c),List(),ParserSettings(2048,16,64,64,8192,64,8388608,256,1048576,Strict,RFC6265,true,Set(),Full,Error,Map(If-Range -> 0, If-Modified-Since -> 0, If-Unmodified-Since -> 0, default -> 12, Content-MD5 -> 0, Date -> 0, If-Match -> 0, If-None-Match -> 0, User-Agent -> 32),false,true,akka.util.ConstantFun$$$Lambda$4534/1539966798@69c23cd4,akka.util.ConstantFun$$$Lambda$4534/1539966798@69c23cd4,akka.util.ConstantFun$$$Lambda$4535/297570074@6b426c59),None,TCPTransport),New,1 second),akka.http.scaladsl.HttpsConnectionContext@7e0f3726,akka.event.MarkerLoggingAdapter@74f3a78b))) has completely filled up because the pool currently does not process requests fast enough to handle the incoming request load. Please retry the request later. See http://doc.akka.io/docs/akka-http/current/scala/http/client-side/pool-overflow.html for more information.

Here is how the summary of a Gatling test looks like:

---- Response Time Distribution ---------------------------------------- t < 800 ms 0 ( 0%)

800 ms < t < 1200 ms 0 ( 0%)

t > 1200 ms 90 ( 9%)

failed 910 ( 91%)

When I execute 100 of simultaneous requests, half of it fails. So, still close to satisfying.

This is a new code: ```

path("uploadfile") {
    withRequestTimeout(20.seconds) {
        extractRequestContext { ctx =>
            implicit val materializer = ctx.materializer

            extractActorSystem { actorSystem =>

                fileUpload("csv") {

                    case (metadata, byteSource) =>

                        val uploadFuture = byteSource.runWith(S3Uploader.sink("s3FileKey")(actorSystem, materializer))

                        onComplete(uploadFuture) {
                            case Success(_) => complete(StatusCodes.OK)
                            case Failure(_) => complete(StatusCodes.FailedDependency)
                        }
                }            
            }
        }
    }
}

def sink(s3Key: String)(implicit as: ActorSystem, m: Materializer) = {
    val regionProvider = new AwsRegionProvider {
        def getRegion: String = Regions.EU_WEST_3.getName
    }

    val settings = new S3Settings(MemoryBufferType, None, new DefaultAWSCredentialsProviderChain(), regionProvider, false, None, ListBucketVersion2)
    val s3Client = new S3Client(settings)(as, m)

    s3Client.multipartUpload("S3_BUCKET", s3Key)
}

```

The complete code with both endpoints can be seen here

I have a couple of questions.

1) Is this a feature? Is this what we can call a backpressure?

2) If I would like this code to behave like the old approach with a temporary file (no failed requests and all of them finish at some point) what do I have to do? I was trying to implement a queue for the stream (link to the source below), but this made no difference. The code can be seen here.

(* DISCLAIMER * I am still a scala newbie trying to quickly understand akka streams and find some workaround for the issue. There are big chances that there is something simple wrong in this code. * DISCLAIMER *)

You can have a look at [Benji](https://github.com/zengularity/benji) (Object storage/S3 based on akka) — cchantep, Aug 25 '18 at 10:44
Did you read the link in the error message? (https://doc.akka.io/docs/akka-http/current/client-side/pool-overflow.html?language=scala) — Viktor Klang, Aug 25 '18 at 18:01
@ViktorKlang I did. It gives a good explanation as to what happens and specifically this point fits my case (prevent peaks by tuning the client application or increase max-open-requests to buffer short-term peaks). Increasing max-open-requests doesn't help much and "tuning the client application" is exactly what I don't know how to achieve. — Michał Kreft, Aug 26 '18 at 08:34
In addition to the tuning answer from Leszek please consider not creating one s3 client per incoming request: the `sink` used in the route should be a fixed uploader flow, perhaps even pre-materialized with an open input (Source.ActorRef or Reactive Streams interface). — Roland Kuhn, Aug 27 '18 at 09:38
This is a good point @RolandKuhn . In my source code I create it only once. Here I have put things together for demonstration purposes. I will edit the question, so it's not an example of a bad practice. — Michał Kreft, Aug 27 '18 at 10:03

score 0 · Accepted Answer · answered Aug 27 '18 at 08:17

0

It’s a backpressure feature.

Exceeded configured max-open-requests value of [32] In the config max-open-requests is set to 32 by default. Streaming is used to work with big amount of data, not to handle many many requests per second.

Akka developers had to put something for max-open-requests. They choose 32 for some reason for sure. And they had no idea what it will be used for. May it be sending 1000 32KB files or 1000 1GB files at once? They don’t know. But they still want to make sure that by default (and 80% of people use defaults probably) the apps will be handled gracefully and safely. So they had to limit processing power.

You asked to do 1000 “now” but I am pretty sure AWS did not send 1000 files simultaneously but used some queue, which may be a good case for you too if you have many small files to upload.

But it is perfectly fine to tune it to your case! If you know your machine and the target will take care of more simultaneous connections, you can change the number to a higher value.

Also, for a lot of HTTP calls use cached host connection pool.

answered Aug 27 '18 at 08:17

Leszek Gruchała

2,300
1
21
18

Do you know how to use `Http().cachedHostConnectionPool(...)` with Alpakka? In the docs there is just `val s3Client = new S3Client(settings)(system, materializer)` – Lukasz Lenart Jan 08 '19 at 08:53
I have not used Alpakka S3 connector, but it seems using it would be the only way to supply a connection pool. Please check their code: https://doc.akka.io/docs/alpakka/current/s3.html – Leszek Gruchała Jan 10 '19 at 08:28
Looks like Alpakka is already using a cached connection pool with a default limit of 32 concurrent connections per host. So no other way just to increase number of `max-open-requests`. Another way is to use Akka Streams with backpressure all the way down but this will slow down the whole system :\ – Lukasz Lenart Jan 11 '19 at 09:34
Then maybe don't use S3 connector and wrap official AWS client from SDK over your streaming logic? The official client can take an InputStream and if you know the whole size of IS, the official client can upload it using multipart low memory usage upload. – Leszek Gruchała Jan 11 '19 at 16:10
Maybe, we will see - it's a code that I have taken over so I don't want to mess a thing :) – Lukasz Lenart Jan 13 '19 at 10:52

Alpakka S3 connector stream won't handle the load, throwing akka.stream.BufferOverflowException

1 Answers1