2

I'm writing something of a profiler whose use case would be something like

long getTiming() 
{
    long start = someGetTimeFunction();
    executeSomething();
    return someTimeFunction() - start;
}

Whatever time function I have used, it seems to add significant overhead. I've tried gettimeofday(), clock_gettime() with CLOCK_MONOTONIC, CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, and I've tried a bit of assembly I found here to call rdtsc.

With a run of 500,000 each, these are their costs:

[INFO] [       OK ] X.TimeGetTimeOfDay (1165 ms)

[INFO] [       OK ] X.TimeRdtscl (1208 ms)

[INFO] [       OK ] X.TimeMonotomicGetTime (1536 ms)

[INFO] [       OK ] X.TimeProcessGetTime (1575 ms)

[INFO] [       OK ] X.TimeThreadGetTime (1522 ms)

This is on a CentOS 5 virtual box VM running on a macbook pro.

Since I need to calculate a delta, I don't need absolute time. And there is no risk of comparing times obtained on different cores or CPUs on an smp system.

Can I do any better?

Here are my test cases:

TEST(X, TimeGetTimeOfDay)
{    
    for (int i = 0; i < 500000; i++) {
        timeval when;
        gettimeofday(&when, NULL);
    }
}

TEST(X, TimeRdtscl)
{
    for (int i = 0; i < 500000; i++) {
        unsigned long long when;
        rdtscl(&when);
    }
}

TEST(X, TimeMonotomicGetTime)
{
    for (int i = 0; i < 500000; i++) {
        struct timespec when;
        clock_gettime(CLOCK_MONOTONIC, &when);
    }
}

TEST(X, TimeProcessGetTime)
{
    for (int i = 0; i < 500000; i++) {
        struct timespec when;
        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &when);
    }
}


TEST(X, TimeThreadGetTime)
{
    for (int i = 0; i < 500000; i++) {
        struct timespec when;
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &when);
    }
}

Here is the rdtsc I got from here.

inline void rdtscl(unsigned long long *t)
{
    unsigned long long l, h;
    __asm__ __volatile__ ("rdtsc" : "=a"(l), "=d"(h));
    *t = ( (unsigned long long)l)|( ((unsigned long long)h) <<32 );
}
Community
  • 1
  • 1
marathon
  • 7,881
  • 17
  • 74
  • 137
  • 3
    So you observe an overhead of **up to** 0.00315ms per call. If sth is significantly faster or even close to that execution time, it is *really* fast. Are you sure this is actually problematic? (I'm not sure a normal PC can actually use smaller intervals anyway.) – Baum mit Augen Sep 21 '14 at 00:44
  • 1
    You should read [time(7)](http://man7.org/linux/man-pages/man7/time.7.html) and arrange your benchmark to measure functions running for longer than 100 milliseconds. – Basile Starynkevitch Sep 21 '14 at 00:50
  • @BaummitAugen the functions I'm timing are about equally as fast and they actually do work. so by timing these, I'm adding 100% overhead. I was hoping to do better. – marathon Sep 21 '14 at 15:41
  • @marathon In this case I suggest you rearrange your benchmark to measure times big enough for the overhead not to matter, like letting functions run more often. I am pretty sure that a normal computer can not really measure time intervals that small precisely anyway. – Baum mit Augen Sep 21 '14 at 17:55

1 Answers1

0

I created a separate thread that updates an boost::atomic long every 1 ms.

My execution thread reads this long for the timestamp.

Much better throughput.

marathon
  • 7,881
  • 17
  • 74
  • 137