2

I have a problem of huge http response with a json slab, where only portion is point of interest. I cannot change the response structure. here is an example

{
  "searchString": "search",
  "redirectUrl": "",
  "0": {
    "numRecords": 123,
    "refinementViewModelCollector": {},
//    Lots of data here
    "results": [
      {
        "productCode": "123",
        "productShortDescription": "Desc",
        "brand": "Brand",
        "productReview": {
          "reviewScore": 0
        },
        "priceView": {
          "salePriceDisplayable": false,
        },
        "productImageUrl": "url",
        "alternateImageUrls": [
          "url1"
        ],
        "largeProductImageUrl": "url4",
        "videoUrl": ""
      },
      {
        "productCode": "124",
        "productShortDescription": "Desc",
        "brand": "Brand",
        "productReview": {
          "reviewScore": 0
         },
        "priceView": {
          "salePriceDisplayable": false,
        },
        "preOrder": false,
        "productImageUrl": "url",
        "alternateImageUrls": [
          "url1"
        ],
        "largeProductImageUrl": "url4",
        "videoUrl": ""
      }
    ]
    //lots of data here
  }
}

My point of interest is entries in results Jason Array, but the are sitting in the middle of json

I created a small Play WS Client like this:

val wsClient: WSClient = ???
val ret = wsClient.url("url").stream()
ret.flatMap { response =>
  response.body.via(JsonFraming.objectScanner(1024))
    .map(_.utf8String)
    .runWith(Sink.foreach(println))
}

this will not work because it will take whole json slab as Json object. I need to skip some data until "results": entry appear in the stream, then start parsing entries and skip all the rest. Any ideas how to do this?

Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54

3 Answers3

5

Check out Alpakka's JSON module, which can stream specific parts of a nested JSON structure:

response
  .body
  .via(JsonReader.select("$.0.results[*]"))
  .map(_.utf8String)
  .runWith(Sink.foreach(println)) // or runForeach(println)
Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
1

There are parsers that support parsing as a stream. For a good example check out this Circe example https://github.com/circe/circe/tree/master/examples/sf-city-lots

Shane Delmore
  • 1,575
  • 2
  • 13
  • 19
  • The approaches described work well for streaming JSON or a big top level array, but do not address this case where there is a large array contained in an object. – Ryan Bair Aug 14 '18 at 14:58
0

I'd love a better, Scala-specific answer to this question, but check out the "Mixed Reads Example" in the documentation for Google's GSON library:

https://sites.google.com/site/gson/streaming

Gson also supports mixed streaming & object model access. This lets your application have the best of both worlds: the productivity of object model access with the efficiency of streaming ... This code reads a JSON document containing an array of messages. It steps through array elements as a stream to avoid loading the complete document into memory. It is concise because it uses Gson’s object-model to parse the individual messages

This should have great memory-performance (the code reads from a Java InputStream, so the full structure is never in memory), but may require some effort to get your results into Scala case classes.

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101