I'm trying to use Akka HTTP and Akka Streams to run a scraper. I'm starting with a bunch of index pages, parsing links out of them, then fetching each link and parsing that page, to return a bunch of individual links. So, like this:
fetch-top-level-page -> list-of-links-to-child-pages -> fetch-child-page -> list-of-links-in-child-page
My problem is that I'm not even able to fetch a single page. Each top-level URL I try to fetch results in a dead letter, and nothing ever makes it further down the pipeline.
In this sample code, all I'm trying to do is send HttpRequest
into the pool to be transformed into an HttpResponse
, and prove that it worked by printing stuff to the screen.
implicit val system = ActorSystem("scraper")
implicit val ec = system.dispatcher
implicit val settings = system.settings
implicit val materializer = ActorMaterializer()
val requests = List(HttpRequest(...), HttpRequest(...))
val poolClientFlow = Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10))
Source(requests)
.map (req => { println("-", req); req}) // this part runs fine
.via(poolClientFlow)
.map(resp => {println("|", resp); resp}) // this never runs
.toMat(Sink.foreach { p => println(p) })(Keep.both)
.run()
Here's what I get:
(-,(HttpRequest(...),Future(<not completed>)))
(-,(HttpRequest(...),Future(<not completed>)))
[INFO] [03/07/2018 15:10:03.681] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [03/07/2018 15:10:03.683] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
I'm new to Akka and obviously making some basic error, since this seems to be the exact use case that Akka, Akka Streams, and Akka HTTP were built for.
Any ideas?