1

I profile my application using oprofile on an ARM Cortex-A8 and I notice a lot of samples with image name "[vectors] (tgid:20712 range:0xffff0000-0xffff1000)"

oprofile reports that this is responsible for 17% of my process time so hopefully someone can explain what this is. I've searched extensively and can't find an explanation.

I was thinking perhaps something to do with exception handling?

artless noise
  • 21,212
  • 6
  • 68
  • 105
  • Does your application use threads? Most of the helper page is dedicated to threading. However, for a Cortex-A8 the helpers are not needed. There are direct ways to do them. Either your libraries or compile flags didn't explain what a good CPU you have. `gettls()` is thread local store and `cmpxchg()` is for lock-free on an ARMv5 (you have ARMv7). – artless noise Jul 31 '14 at 19:29
  • Yes the process is threaded, but more importantly I think it is frequently copying an object protected by a mutex. Presumably the cmpxchng() function is used in the implementation of pthread_mutex_lock. – Robert Smith Aug 01 '14 at 11:40
  • [Eglibc 2.19](http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_19/libc/ports/sysdeps/arm/bits/atomic.h?revision=25243&view=markup) doesn't have this problem afaik. Older versions like [Eglibc 2.15](http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_15/ports/sysdeps/arm/bits/atomic.h?revision=16510&view=markup) do. Also, it seems to depend on the compiler options to the eglib. The `pthread_mutex_lock` will use different versions, depending on the compile flags. 2.19 will use the [atomic builtins](https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/_005f_005fatomic-Builtins.html). – artless noise Aug 01 '14 at 16:25
  • Are your profile numbers similar for `pthread_mutex_lock`? Usually, the legacy *cmpxchg()* should be as fast as the newer one (`ldrex`/`strex`) in the case of one mutex. In the case of multiple mutexes, it may think there is contention, when there is none and restart the code. This should be small, except pathological cases. Ie, it is better to get rid of the `pthread_mutex_lock`, than attempt to get the optimized version for your CPU if possible. – artless noise Aug 01 '14 at 16:34
  • We're currently building with GCC 4.3.3 and glibc 2.8 with plans to move to GCC 4.6. The mutex was being used to protect a reference count. I changed it to use GCC builtin atomic functions and the performance is much improved, thanks for the help. – Robert Smith Aug 07 '14 at 08:30

1 Answers1

2

Linux uses the "high vectors" setting, which places the exception entry vectors at 0xffff0000 - thus all system calls, interrupts, faults, etc. will pass through this page.

However, since the vectors page must always be present, The ARM kernel makes use of the otherwise wasted space in the rest of the page to house some user-accessible helper functions for a few things that would be otherwise difficult to implement in a portable way. Your process (most likely lower-level libraries) may well be making use of these too - since typical usage is to just call their fixed addresses directly there probably aren't any symbols to resolve for them.

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • The oprofile output shows that the process makes a LOT of calls to pthread_mutex_lock and based on the link you gave it makes sense then that this would correlate with the [vectors] entries that I am seeing – Robert Smith Aug 01 '14 at 11:37