3

I have a linux application (written in C) which is constantly creating and deleting network-namespaces. Typically a create/delete cycle takes around 300ms.

During performance investigations, I ran strace on the program to find out which system call was taking most time. From the strace output, it seems that very first invocation of unshare just takes 4ms on my system however subsequent invocations takes close to 200ms.

strace -p <pid> -T
unshare(CLONE_NEWNET)                   = 0 <0.004150>
unshare(CLONE_NEWNET)                   = 0 <0.192055>
unshare(CLONE_NEWNET)                   = 0 <0.192872>
unshare(CLONE_NEWNET)                   = 0 <0.190303>
unshare(CLONE_NEWNET)                   = 0 <0.193062>

The program structured such that at the start of a cycle (internal control cycle), it creates a network-namespace and deletes the namespace at the end of the cycle.

Just to experiment, I modified my application to not delete the network namespace at the end of a control cycle -- Only create new network namespaces but never delete them. This improved the performance significantly and I do not any latency in subsequent calls to unshare system call. Each unshare system call takes 2-3ms.

strace -p <pid> -T    
unshare(CLONE_NEWNET)                   = 0 <0.003102>
unshare(CLONE_NEWNET)                   = 0 <0.002980>
unshare(CLONE_NEWNET)                   = 0 <0.003070>
unshare(CLONE_NEWNET)                   = 0 <0.003124>
unshare(CLONE_NEWNET)                   = 0 <0.002952>

Clearly, deleting the network-namespace is somehow impacting/delaying the subsequent network-namespace creation.

What could be going on here? How can I debug this further?

I am using linux kernel 3.12.9-301.fc20.x86_64.

chrk
  • 4,037
  • 2
  • 39
  • 47
  • To profile linux kernel inner working you can try `perf` sampling profiler (part of `linux-tools` package in debian/ubuntu; kernel part is enabled in most linux distributions. Just `perf record ./your_test` or `perf record -g ./your_test` (from root) will see kernel's functions. You also may be interested in ftrace tracing to see exact times. Intro to ftrace wrapper: http://www.slideshare.net/brendangregg/linux-performance-tools-2014 (slide 73,76 - [funcgraph](https://github.com/brendangregg/perf-tools/blob/master/kernel/funcgraph) by Gregg) or `trace-cmd` (http://lwn.net/Articles/410200/) – osgx Mar 03 '15 at 07:47
  • 1
    Thanks for references!!! It helped a lot. I used "ftrace" to get kernel debug traces (added filter for SyS_setns and SyS_unshare functions). This revealed that problem is due to a spin lock (probably net_mutex) which is causing the delay. – CuriousSoul Mar 03 '15 at 17:55
  • CuriousSoul, I think you may add your experience of ftrace/trace-cmd just as answer to your question. – osgx Mar 03 '15 at 17:57

1 Answers1

3

Linux ftrace is very useful for debugging/profiling system calls.

Refer to following articles

  1. http://lwn.net/Articles/365835/
  2. http://lwn.net/Articles/366796/

ftrace-cmd (http://lwn.net/Articles/410200/) provides a convenient front end for quick debugging. However for fine grained debugging, native ftrace interface is better and fairly easy.

What worked for me --

  1. Instrument the code to enable/disable trace point only in the region of interest. This basically involves writing "1" (enable) or "0" (disable) to file "tracing-on" under debug-fs.
  2. Insert the application pid to file "set_ftrace_pid" under debug-fs.
  3. Add one or more system call in set_trace_filter. The available symbols are present in file available_filter_function under debug-fs.
  4. Use your favorite tracer to file current_tracer. I used function_graph.

Note that steps 1 and 2 are not mandatory but it helps to filter lot of noise from the out and get you data which is relevant to what you are investigating.

In summary, ftrace rules!!!