The main difference with low latency timings is that
- every micro-second counts. You will have an idea of most much each micro-second costs your business per year and how much time it is worth reducing each micro-second.
- you want to measure the highest 99% or even 99.99% latencies. (worst 1% or 0.01% respectly)
- you want a fast clock which is often limited to one host, or even one socket. (You can measure low latency between hosts with specialist hardware) For multi-millisecond timings you can relatively easily measure between hosts (with just NTP configured)
- you want to minimise garbage, esp in your measurements.
- it is quite likely you will need to develop application specific tools which are embedded into the application and run in production. You can use profilers as a start but most ultra low latency applications don't show anything useful in commercial profilers (nor do they GC much, if at all when running)
You can have a read of my blog for general low latency, high performance testing practices (some of these are nano-second based). Vanilla Java