0

I have a restful blog server (http://jsonplaceholder.typicode.com) that responds to these two URIs

/posts/{id}           : get a blog post by ID
/comments?postId={id} : get all the comments on a blog by the blog's id

Starting with a batch of blog IDs, I want to create a flow that executes this sequence of steps:

  1. Hit the /posts endpoint to get the json for a post
  2. Deserialize the json to a Blog case class
  3. Hit the /comments endpoint for each blog and fetch the JSON list of comments for that post
  4. Deserialize the comment JSON to a list of Comment case objects
  5. Do some processing on the comments (stat collection or spam analysis)

Yes, I know I could skip straight to step 3 if I have a blog id. Pretend I can't

I would like to get a bunch of HTTP requests going to the server in step 1. To achieve that, I use a cachedHostConnectionPool. Here is what I have so far:

final case class Blog(id: Int, userId: Int, title: String, body: String) 
final case class Comment(id: Int, postId: Int, name: String, email: String,  body: String)

object AkkaStreamProcessor extends App {
  implicit val actorSystem = ActorSystem("blogProcessor")
  import actorSystem.dispatcher
  implicit val flowMaterializer = ActorMaterializer()

  private def getBlogUri(id: Integer): String = "/posts/" + id
  private def getCommentsUri(blog: Blog): String = "/comments?postId=" + blog.id
  private def parseBlogResponse(jsonResponse: String): Blog = Json.parse(jsonResponse).as[Blog]
  private def parseCommentsResponse(jsonResponse: String): List[Comment] = Json.parse(jsonResponse).as[List[Comment]]

  val pooledConnectionFlow = {
    val connectionSettings = ConnectionPoolSettings(actorSystem)
      .withMaxConnections(32)
      .withMaxOpenRequests(32)
      .withMaxRetries(3)
    Http().cachedHostConnectionPool[Int](host = "jsonplaceholder.typicode.com", settings = connectionSettings)
  }

  val source = Source(1 to 32)
  val fetchBlogsFlow = Flow[Int]
    .map((id: Int) => (getBlogUri(id),id))
    .map{ case(uri:String, id:Int) => (HttpRequest(method = HttpMethods.GET, uri = uri), id) }
    .via(pooledConnectionFlow)
    .map { case(response: Try[HttpResponse], id:Int) => handleBlogResponse(response, id) }
    .map((jsonText: Try[String]) => jsonText.map(j => parseBlogResponse(j)))

  val sink = Sink.foreach[Try[Blog]](blog => blog.map(b=> println(b)))
  source.via(fetchBlogsFlow).runWith(sink)

  private def handleBlogResponse(response: Try[HttpResponse], id: Int): Try[String] = {
    println(s"Received response for id $id on thread ${Thread.currentThread().getName}")
    response.flatMap((r: HttpResponse) => {
      r.status match {
        case StatusCodes.OK => {
          Success(Await.result(Unmarshal(r.entity).to[String], Duration.Inf))
        }
        case _ => Failure(new RuntimeException("Invalid response : " + r.status.toString()))
      }
    })
  }    
}

Now what I want is to create another flow for doing steps 3 and 4 that I would chain after the first flow. However, I am struggling with the pesky Try[Blog] output from the first flow. How do I pipe a Try[Blog] into another HTTP request? Is there a way to split the pipeline, with failures going one way and success going another?

Here is what I have for the second flow, but I'm not sure how to make the chaining work without calling get on the Try:

val processBlogsFlow = Flow[Try[Blog]]
  .map((tryBlog: Try[Blog]) => tryBlog.get)
  .map((blog: Blog) => (HttpRequest(method=HttpMethods.GET, uri=getCommentsUri(blog)), blog.id ))
  .via(pooledConnectionFlow)
bigh_29
  • 2,529
  • 26
  • 22
  • 1
    If you want to discard failures, you can use `Flow.collect`. If you want to handle failures, you can partition the stream, e.g. using `Flow.groupBy` or `Partition`. What shall happen when a request results in a `Failure`? – devkat Sep 26 '16 at 13:42
  • It will be logged and probably re-queued for a retry in a later batch. – bigh_29 Sep 28 '16 at 12:20

1 Answers1

1

There is a very good blog entry for dealing with Try. In your particular example I would preserve the Try so you can get information on the original failure:

def blogToTuple(blog : Blog) = 
  (HttpRequest(method=HttpMethods.GET, uri=getCommentsUri(blog)), blog.id )

val processBlogsFlow : Flow[Try[Blog], Try[HttpResponse], _] = 
  Flow[Try[Blog]]
    .map(_ map blogToTuple)
    .mapAsync(1) { _ match {
         case Success(req) => 
           Source.single(req).via(pooledConnectionFlow).runWith(Sink.head)
         case ex => Future { x }
       }
    }

Now the Try can be passed to your Sink which can report on any error messages as well as reporting valid responses.

Tom
  • 43,810
  • 29
  • 138
  • 169
Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125