0

I use the json4s library in many projects, and I recently noticed a weird behavior that I'm not able to explain while debugging some unit tests.

The issue is quite simple: when serializing a case class into a string, the first serialization performed takes an extremely long time. Subsequent write operations, go much, much faster. Here is some sample code that displays this behavior :

import org.json4s.{DefaultFormats, Formats}
import org.json4s.jackson.Serialization.write

object Timer {
  def time[R](message : String)(block: => R): R = {
    val t0 = System.nanoTime()
    val result = block
    val t1 = System.nanoTime()
    println(s"$message : Elapsed time: " + (t1 - t0)*1e-6  + "ms")
    result
  }
}

case class Dummy(attr1 : String, attr2 : Int, attr3 : String, attr4 : Long, attr5 : Option[Int])

object Main extends App {

  implicit val formats: Formats = DefaultFormats

  val obj = Dummy("hello", 3, "", 4L, None)

  Timer.time("First pass") {
    write(obj)
  }

  Timer.time("Second pass") {
    write(obj)
  }
}

On my computer, this displays:

First pass : Elapsed time: 503.460855ms
Second pass : Elapsed time: 0.569251ms

I am using:

  • scala version 2.13.3
  • json4s-jackson version 3.6.10

If you want to reproduce this issue, I have created a small repository with this example : https://github.com/Thundzz/json4s-perf-check

I could not find any resource mentioning this performance issue, be it on the json4s readme, in the github issues or here on stackoverflow.

To be clear, this is not very problematic since only the first execution takes a lot of time, but I expected this behavior to maybe be documented somewhere at least, which doesn't seem to be the case.

  • Am I doing something wrong in my usage of the library ?
  • Is this expected behavior ? I am also wondering if this would warrant an issue or PR on the json4s repository. Even just mentioning this performance quirk in the readme would probably be helpful to other users who might get surprised when they notice this too.

Notes:

  • I know that the code example above is not a proper benchmark. However, it is quite hard to benchmark a "first call" to a function by using tools like ScalaMeter, and although it has many flaws, the code above produces consistent timing behaviour (the first call is orders of magnitude slower in terms of execution time)
E_net4
  • 27,810
  • 13
  • 101
  • 139
Thundzz
  • 675
  • 1
  • 11
  • 15
  • 1
    First, this is not a correct way to benchmark the JVM so this finding is meaningless _(not trying to be rude)_. Second, since **jackson** uses runtime reflection it is not surprising that it is slow, specially it first call, maybe latter it cache things. – Luis Miguel Mejía Suárez Dec 27 '20 at 15:03
  • @LuisMiguelMejíaSuárez I agree with you that this setup is not ideal, but on the other hand this is not supposed to be a proper "benchmark". It is, though, a deterministic way of reproducing a huge timing difference. If you have any suggestion at all on how I could display the problem in a more precise fashion knowing that it only appears on the first call, your suggestions are welcome. – Thundzz Dec 27 '20 at 16:14
  • I have tried with the **native** flavour of json4s and I am getting the same discrepancies: the execution time of the first call is in the range of the hundreds of milliseconds whereas subsequent calls are below 1ms – Thundzz Dec 27 '20 at 16:21
  • If you have a look at the repository I linked, you can notice that I have included a "pre-heating" loop there before making the first call to json4s in order to rule this possibility out. I have phased out that code here in order to be more concise because I figured someone would tell me it was irrelevant regarding the problem I am reporting. Again, that's probably not the best way to phase out the JVM start time, so if you have an alternative, it's welcome ! :) – Thundzz Dec 27 '20 at 16:25
  • https://scalameter.github.io/ - Once again, not sure if this is the best place to ask that, maybe it would be better as an issue in their github repo? – Luis Miguel Mejía Suárez Dec 27 '20 at 16:27
  • 1
    I agree, I will post there if I get no answer here after a few days. I posted here because their contribution guidelines indicate to ask questions on stackoverflow (ref: https://github.com/json4s/json4s/blob/master/CONTRIBUTING.md). For the ScalaMeter suggestion, I have considered it but it seems that it's based on repeating the same function call many times in order to remove noise. If my observation is really caused by reflection taking time on the first execution, I'm not sure I can test "first" calls with it... In any case, thanks for the suggestion... I will continue looking into it. – Thundzz Dec 27 '20 at 16:55
  • 1
    Just to be clear on why I'm insisting on this: for an API that handles user calls, it makes a real difference if the first call after a service reboot takes an additional several hundred milliseconds. If you factor into this that several (in my case at least) microservices are often re-deployed at the same time, this sort of issue really makes a "transparent" deployment impossible, or calls for some code adjustment (as in serializing every kind of possible response ahead of time, before making the API available to external calls). – Thundzz Dec 27 '20 at 16:55
  • 1
    One thing you can do to avoid slow API calls is to load one instance as part of the system start up. – Tomer Shetah Dec 27 '20 at 19:57
  • 1
    From a quick browse of Json4s's source, it looks like they first construct their JSON AST `JValue` and then hand that to Jackson. So if Jackson reflection is the problem, then serializing a dummy object which exercises the AST (e.g. has arrays, objects, numbers, strings, booleans, maybe even a null) at startup might be all you have to do. – Levi Ramsey Dec 28 '20 at 03:28
  • 2
    However, on further browsing, constructing the AST is itself using reflection... it may be more useful to use a Json library which takes advantage of Scala's macros to derive the serializer at compile time. – Levi Ramsey Dec 28 '20 at 03:42

0 Answers0