Measuring DirectByteBuffer with async-profiler and perf

Question

I'm playing around with async-profiler and perf and decided to measure kernel activity for DirectByteBuffer disk IO. Here is the code (written in Scala, but its Java version should be obvious):

val path = Paths.get("/tmp/test")

val buffer = ByteBuffer.allocateDirect(4096 * 4096)

def main(args: Array[String]): Unit = {
  var fullReadsCount = 0
  while (true) {
    var bytesRead = 0
    var ch: SeekableByteChannel = null
    try {
      ch = Files.newByteChannel(path)
      while (bytesRead >= 0) {
        bytesRead = ch.read(buffer)
        buffer.clear()
      }
    } finally {
      if (ch != null)
        ch.close()
    }
    fullReadsCount += 1
    if(fullReadsCount % 100 == 0) println(fullReadsCount)
  }
}

I ran this code multiple times and did both perf and async-profiler and noticed the following result:

async-profiler

$~/profiler.sh -i 28169 -d 30 <pid>
 //.... stack traces ommited
     ns  percent  samples  top

264788732   61.02%     9317  copy_user_enhanced_fast_string_[k]
 41510919    9.57%     1467  generic_file_read_iter_[k]
  9333863    2.15%      331  find_get_entry_[k]
  4181131    0.96%      148  __radix_tree_lookup_[k]
  4057194    0.94%      143  copy_page_to_iter_[k]
  1860485    0.43%       63  __d_lookup_rcu_[k]
  1610407    0.37%       50  _raw_spin_unlock_irqrestore_[k]

perf sudo perf record -F 31499 -g -p <pid> -- sleep 30

Averagely among all runs I did I noticed that copy_user_enhanced_fast_string percentage is different in perf and async-profiler 61.02% vs 77.65%

QUESTION: Why is the percentage different for copy_user_enhanced_fast_string sampled by perf and async-profiler? I tried to provide the equal condition (Frequency and sampling period and I did not run both of the profiler simulatenously. 31499 Hz ≈ 28169 nanos).

Or my interpretation of the result is wrong?

score 4 · Accepted Answer · answered Sep 21 '18 at 02:15

The selected profiling interval (28 μs) is too short.
Check dmesg - there are probably kernel warnings like

perf interrupt took too long (18047 > 18000), lowering kernel.perf_event_max_sample_rate to 25000

async-profiler differs from perf in the mechanism of processing PMU events. perf just collects samples in the ring buffer while async-profiler sends a signal to the process in order to invoke application-specific callback. Typically there is no visible difference to a user, but when profiling frequency is too high, signals may introduce additional noise and affect process scheduling.

I suggest to increase profiling interval to at least 100 μs (10000 Hz). This should make measurements more reliable.

Measuring DirectByteBuffer with async-profiler and perf

1 Answers1