4

I take this simple example from the akka http doc: http://doc.akka.io/docs/akka-http/current/scala/http/client-side/request-level.html

And I modify it a bit to ask for one hundred requests. The application blocks after 32 requests (the default max-open-requests configuration). Why?

import akka.actor.{Actor, ActorLogging, ActorSystem, Props}
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.stream.{ActorMaterializer, ActorMaterializerSettings}
import akka.util.ByteString

import scala.io.StdIn

object AkkaClientExample extends App {
  val system: ActorSystem = ActorSystem("BatchAkka")
  try {
    val unformattedAddresses = (1 to 100).map(i => s"Rue de la Gracieuse $i, Préverenges, Switzerland")

    val googleGeocoder = system.actorOf(GoogleGeocoder.props, "GoogleGeocoder")

    unformattedAddresses.foreach(e => googleGeocoder ! GoogleGeocoder.GeoCode(e))

    println(">>> Press ENTER to exit <<<")
    StdIn.readLine()
  } finally {
    system.terminate()
  }
}

object GoogleGeocoder {
  def props: Props = Props[GoogleGeocoder]

  final case class GeoCode(unformattedAddress: String)
}

class GoogleGeocoder extends Actor with ActorLogging {
  import GoogleGeocoder._
  import akka.pattern.pipe
  import context.dispatcher

  final implicit val materializer: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(context.system))

  val http = Http(context.system)  

  def receive = {
    case GeoCode(unformattedAddress) =>
      log.info(s"GeoCode $unformattedAddress")
      http
        .singleRequest(HttpRequest(uri = url(unformattedAddress)))
        .map(r => (unformattedAddress, r))
        .pipeTo(self)

    case (unformattedAddress: String, resp @ HttpResponse(StatusCodes.OK, headers, entity, _)) =>
      log.info(s"Success response comming for $unformattedAddress")
      entity.dataBytes.runFold(ByteString(""))(_ ++ _).foreach { body =>
        val response = body.utf8String.replaceAll("\\s+", " ").take(50)
        log.info(s"Success response for $unformattedAddress: $response")
      }

    case (unformattedAddress: String, resp @ HttpResponse(code, _, _, _)) =>
      log.info(s"Request failed, response code: $code for $unformattedAddress")
      resp.discardEntityBytes()
  }

  def url(unformattedAddress: String): String =
    //s"https://maps.googleapis.com/maps/api/geocode/json?address=${URLEncoder.encode(unformattedAddress, "UTF-8")}&key=${URLEncoder.encode(googleApiKey, "UTF-8")}"
    s"https://www.epfl.ch/"
}

output:

[INFO] [07/28/2017 20:08:26.977] [BatchAkka-akka.actor.default-dispatcher-4] [akka://BatchAkka/user/GoogleGeocoder] GeoCode Rue de la Gracieuse 1, Préverenges, Switzerland
[INFO] [07/28/2017 20:08:27.080] [BatchAkka-akka.actor.default-dispatcher-4] [akka://BatchAkka/user/GoogleGeocoder] GeoCode Rue de la Gracieuse 2, Préverenges, Switzerland
...
[INFO] [07/28/2017 20:08:27.098] [BatchAkka-akka.actor.default-dispatcher-13] [akka://BatchAkka/user/GoogleGeocoder] GeoCode Rue de la Gracieuse 99, Préverenges, Switzerland
[INFO] [07/28/2017 20:08:27.098] [BatchAkka-akka.actor.default-dispatcher-13] [akka://BatchAkka/user/GoogleGeocoder] GeoCode Rue de la Gracieuse 100, Préverenges, Switzerland

[INFO] [07/28/2017 20:08:27.615] [BatchAkka-akka.actor.default-dispatcher-11] [akka://BatchAkka/user/GoogleGeocoder] Success response comming for Rue de la Gracieuse 1, Préverenges, Switzerland
[INFO] [07/28/2017 20:08:27.620] [BatchAkka-akka.actor.default-dispatcher-11] [akka://BatchAkka/user/GoogleGeocoder] Success response comming for Rue de la Gracieuse 4, Préverenges, Switzerland
[INFO] [07/28/2017 20:08:27.668] [BatchAkka-akka.actor.default-dispatcher-17] [akka://BatchAkka/user/GoogleGeocoder] Success response for Rue de la Gracieuse 4, Préverenges, Switzerland: <!doctype html><html lang="fr" class="no-js"><head
[INFO] [07/28/2017 20:08:27.668] [BatchAkka-akka.actor.default-dispatcher-21] [akka://BatchAkka/user/GoogleGeocoder] Success response for Rue de la Gracieuse 1, Préverenges, Switzerland: <!doctype html><html lang="fr" class="no-js"><head
...
[INFO] [07/28/2017 20:08:27.787] [BatchAkka-akka.actor.default-dispatcher-5] [akka://BatchAkka/user/GoogleGeocoder] Success response comming for Rue de la Gracieuse 31, Préverenges, Switzerland
[INFO] [07/28/2017 20:08:27.795] [BatchAkka-akka.actor.default-dispatcher-15] [akka://BatchAkka/user/GoogleGeocoder] Success response comming for Rue de la Gracieuse 32, Préverenges, Switzerland
[INFO] [07/28/2017 20:08:27.802] [BatchAkka-akka.actor.default-dispatcher-16] [akka://BatchAkka/user/GoogleGeocoder] Success response for Rue de la Gracieuse 31, Préverenges, Switzerland: <!doctype html><html lang="fr" class="no-js"><head
[INFO] [07/28/2017 20:08:27.806] [BatchAkka-akka.actor.default-dispatcher-17] [akka://BatchAkka/user/GoogleGeocoder] Success response for Rue de la Gracieuse 32, Préverenges, Switzerland: <!doctype html><html lang="fr" class="no-js"><head

blocked after the first 32 requests.


Update taking into account @shutty's answer:

I've modified the program as follows, and it works:

class GoogleGeocoder extends Actor with ActorLogging {
  import GoogleGeocoder._
  import akka.pattern.pipe
  import context.dispatcher

  final implicit val materializer: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(context.system))

  val http = Http(context.system)  

  val queue = new scala.collection.mutable.Queue[String]
  var currentRequests = 0
  val MaxCurrentRequest = 10

  def receive = {
    case GeoCode(unformattedAddress) =>
      if (currentRequests < MaxCurrentRequest)
        query(unformattedAddress)
      else
        queue += unformattedAddress

    case (unformattedAddress: String, resp @ HttpResponse(StatusCodes.OK, headers, entity, _)) =>
      log.info(s"Success response comming for $unformattedAddress")
      entity.dataBytes.runFold(ByteString(""))(_ ++ _).foreach { body =>
        currentRequests = currentRequests - 1
        queryNext()
        val response = body.utf8String.replaceAll("\\s+", " ").take(50)
        log.info(s"Success response for $unformattedAddress: $response")
      }

    case (unformattedAddress: String, resp @ HttpResponse(code, _, _, _)) =>
      log.info(s"Request failed, response code: $code for $unformattedAddress")
      resp.discardEntityBytes()
      currentRequests = currentRequests - 1
      queryNext()

    case f: Status.Failure =>
      log.info("failure" + textSample(f))

    case m =>
      log.info("unexpected message: " + textSample(m))
  }

  def query(unformattedAddress: String) {
    log.info(s"GeoCode $unformattedAddress")
    http
      .singleRequest(HttpRequest(uri = url(unformattedAddress)))
      .map(r => (unformattedAddress, r))
      .pipeTo(self)
  }

  def queryNext() {
    if (queue.nonEmpty) {
      query(queue.dequeue)
    }
  }

  def url(unformattedAddress: String): String =
    //s"https://maps.googleapis.com/maps/api/geocode/json?address=${URLEncoder.encode(unformattedAddress, "UTF-8")}&key=${URLEncoder.encode(googleApiKey, "UTF-8")}"
    s"https://www.epfl.ch/"
}

So, basically adding a queue.

However, is there a better way to achieve this?

I imagine cases where this implementation could fail: For instance, if http.singleRequest produces a failing future, currentRequests will not be decreased. I could handle this on case f: Status.Failure, but still, this solution looks very error-prone.

Maybe akka provides already some mechanism to handle a queue?

Is there a way to add back-pressure to the client (so that AkkaClientExample: unformattedAddresses.foreach(e => googleGeocoder ! GoogleGeocoder.GeoCode(e)) gets blocked when MaxCurrentRequest is reached)?

David Portabella
  • 12,390
  • 27
  • 101
  • 182

1 Answers1

4

If you run your example with akka.logging = DEBUG, you'll notice the following output:

InputBuffer (max-open-requests = 32) now filled with 31 request after enqueuing GET / Empty InputBuffer (max-open-requests = 32) now filled with 32 request after enqueuing GET / Empty InputBuffer (max-open-requests = 32) exhausted when trying to enqueue GET / Empty InputBuffer (max-open-requests = 32) exhausted when trying to enqueue GET / Empty InputBuffer (max-open-requests = 32) exhausted when trying to enqueue GET / Empty

There is quite a comprehensive description how akka-http handles pooling of client requests, but in short, if you overload the pool with more thant max-open-requests, it will start dropping the requests:

http
 .singleRequest(HttpRequest(uri = url(unformattedAddress)))
 .map(r => (unformattedAddress, r)) // <- HERE
 .pipeTo(self)

When you do a map over a Future in Scala, it will execute your callback only on successful Future completion, which is not the case in your code. If you rewrite the code in a bit different way like:

http
  .singleRequest(HttpRequest(uri = url(unformattedAddress)))
  .onComplete {
    case Success(r) =>
      self ! (unformattedAddress, r)
    case Failure(ex) =>
      log.error(ex, "pool overflow")
  }

You'll see a bunch of exceptions complaining about failed Future.


Updated:

As for my own opinion, actors and streams are not a great fit when you need back-pressure. As an option, you can rewrite your code without actors completely:

def url(addr: String) = "http://httpbin.org/headers"
implicit val system: ActorSystem = ActorSystem("BatchAkka")
implicit val mat: ActorMaterializer = ActorMaterializer()
import system.dispatcher
val http = Http()
val addresses = (1 to 100).map(i => s"Rue de la Gracieuse $i, Préverenges, Switzerland")
Source(addresses)
  .mapAsync(4)(addr => http.singleRequest(HttpRequest(uri = url(addr))))
  .map(response => println(response.status))
  .runWith(Sink.seq)
  .map(_ => println("done"))

In this solution, you'll have only 4 parallel requests to the server with back-pressure, bells and whistles.

shutty
  • 3,298
  • 16
  • 27
  • ok, I see the problem; how to solve it? i've implemented a simple idea using a queue, and it works. however, there must be a better way. i've updated the question with this solution, and the questions raised. – David Portabella Jul 31 '17 at 19:49
  • 1
    I see these options: 1: add throttling into the stream (.throttle() method) not to overload the pool, 2: get rid of actors completely and query geocoder directly inside the stream with .mapAsync() – shutty Aug 01 '17 at 14:05