I'm writing something of a profiler whose use case would be something like
long getTiming()
{
long start = someGetTimeFunction();
executeSomething();
return someTimeFunction() - start;
}
Whatever time function I have used, it seems to add significant overhead. I've tried gettimeofday()
, clock_gettime()
with CLOCK_MONOTONIC, CLOCK_PROCESS_CPUTIME_ID
and CLOCK_THREAD_CPUTIME_ID
, and I've tried a bit of assembly I found here to call rdtsc
.
With a run of 500,000 each, these are their costs:
[INFO] [ OK ] X.TimeGetTimeOfDay (1165 ms)
[INFO] [ OK ] X.TimeRdtscl (1208 ms)
[INFO] [ OK ] X.TimeMonotomicGetTime (1536 ms)
[INFO] [ OK ] X.TimeProcessGetTime (1575 ms)
[INFO] [ OK ] X.TimeThreadGetTime (1522 ms)
This is on a CentOS 5 virtual box VM running on a macbook pro.
Since I need to calculate a delta, I don't need absolute time. And there is no risk of comparing times obtained on different cores or CPUs on an smp system.
Can I do any better?
Here are my test cases:
TEST(X, TimeGetTimeOfDay)
{
for (int i = 0; i < 500000; i++) {
timeval when;
gettimeofday(&when, NULL);
}
}
TEST(X, TimeRdtscl)
{
for (int i = 0; i < 500000; i++) {
unsigned long long when;
rdtscl(&when);
}
}
TEST(X, TimeMonotomicGetTime)
{
for (int i = 0; i < 500000; i++) {
struct timespec when;
clock_gettime(CLOCK_MONOTONIC, &when);
}
}
TEST(X, TimeProcessGetTime)
{
for (int i = 0; i < 500000; i++) {
struct timespec when;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &when);
}
}
TEST(X, TimeThreadGetTime)
{
for (int i = 0; i < 500000; i++) {
struct timespec when;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &when);
}
}
Here is the rdtsc I got from here.
inline void rdtscl(unsigned long long *t)
{
unsigned long long l, h;
__asm__ __volatile__ ("rdtsc" : "=a"(l), "=d"(h));
*t = ( (unsigned long long)l)|( ((unsigned long long)h) <<32 );
}