3

So, I'm using Play framework 2.7 to setup a streaming server. What I'm trying to do is stream about 500 custom case class objects that are all of similar size.

This is part of the controller that generates the stream -

def generate: Action[AnyContent] = Action {
    val products = (1 to 500).map(Product(_, "some random string")).toList
    Ok.chunked[Product](Source(products))
  }

where Product is the custom case class I'm using. An implicit writable deserialises this object to a json.

and this is part of the controller that processes this stream -

def process(): Action[AnyContent] = Action.async {
    val request = ws.url(STREAMING_URL).withRequestTimeout(Duration.Inf).withMethod("GET")
    request.stream().flatMap {
      _.bodyAsSource
        .map(_.utf8String)
        .map { x => println(x); x }
        .fold(0) { (acc, _) => acc + 1 }
        .runWith(Sink.last)
        .andThen {
          case Success(v) => println(s"Total count - $v")
          case Failure(_) => println("Error encountered")
        }
    }.map(_ => Ok)
  }

What I expected is that each object of my case class is transmitted as a single chunk and received likewise, so that they can be individually serialised and used by the receiver. That means, using the above code, my expectation is that I should receive exactly 500 chunks, but this value always comes out to be more than that.

What I can see is that exactly one object among these 500 is transmitted in split and transmitted in 2 chunks instead of 1.

This is a normal object, as seen on the receiving side -

{
  "id" : 494,
  "name" : "some random string"
}

and this is an object that's split in two -

{
  "id" : 463,
  "name" : "some random strin
g"
}

as such, this cannot be serialised back into an instance of my Product case class.

However, if I have some sort of throttling on the source in the sender controller, I receive the chunks just as expected.

For instance, this works completely fine where I stream only 5 elements per second -

def generate: Action[AnyContent] = Action {
    val products = (1 to 500).map(Product(_, "some random string")).toList
    Ok.chunked[Product](Source(products).throttle(5, 1.second))
  }

Can anyone help me understand why this happens?

gravetii
  • 9,273
  • 9
  • 56
  • 75
  • Just guess, but maybe without throttling the sender is producing the elements faster than it can send them, so it packs the chunk to the limit, and some string is divided. With throttling maybe it has enough time to put one element per chunk. – amorfis Nov 24 '20 at 09:48
  • This info might explain: `Another pitfall is that Actor messages can be lost and must be retransmitted in that case. Failure to do so would lead to holes at the receiving side.` from https://doc.akka.io/docs/akka/current/stream/stream-introduction.html#motivation . i guess you will have to handle backpressure. – Felipe Nov 28 '20 at 09:09
  • @Felipe I do not think that's a problem with streams - this is the case with actors and streams are built over actors to avoid these pitfalls. – gravetii Nov 29 '20 at 09:37
  • Are you sure this is not just an issue of concurrency of the `println`? Can you try using logging instead and see if it still reproduces? – Tomer Shetah Nov 30 '20 at 13:31
  • It's not, because I'm also counting the number of chunks on the receiver side. They're more than what's expected. – gravetii Nov 30 '20 at 14:35

1 Answers1

3

As described here there is a JsonFraming to separate valid JSON objects from incoming ByteString stream.

In your case you can try this way

  _.bodyAsSource.via(JsonFraming.objectScanner(Int.MaxValue)).map(_.utf8String)
Dmytro Maslenko
  • 2,247
  • 9
  • 16