Speed of printf()

Question

I was having some fun in C language with time.h library, trying to measure the number of clock ticks of some basic functions, just to figure out how fast they actually are. I used the clock() function. In this case I was measuring the printf() function.

Look at my program:

#include <stdio.h>
#include <time.h>

void main()
{
    const int LIMIT = 2000;
    const int LOOP = 20;
    int results[LOOP];

    for(int i=0; i<LOOP; i++)
    {
        int j;
        clock_t time01 = clock();

        for(j=1; j<LIMIT; j++)
        {
            printf("a");
        }

        clock_t time02 = clock();
        results[i] = (int) (time02 - time01);
    }

    for(int i=0; i<LOOP; i++)
    {
        printf("\nCLOCK TIME: %d.", results[i]);        
    }
    getchar();
}

The program just basically counts 20 times the number of clock ticks of 2000 times called printf("a") function.

The strange thing I don't understand is the result. I get most of the time, even when doing other tests, randomly two groups of results:

CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 31.
CLOCK TIME: 47.
CLOCK TIME: 31.

I don't understand how exactly compiler handles that function. There is some test for % character I guess, but that wouldn't make that difference. Looks more like compiler is doing something in the memory... (?) Does anyone know the precise background of compiling this code or why there appears that difference mentioned above? Or at least some link that would help me?

Thank you.

Most of the time is spend doing I/O and printing. The cost of the actual parsing of the format string (and most of the rest of your program) is minimal compared to that I/O time. — , Feb 25 '12 at 15:59
You have to remember, you are at the mercy of the OS here, namely the scheduler and the I/O subsystem. Those times seem to indicate whether it took 2 or 3 scheduler quanta. — Jonathon Reinhart, Feb 25 '12 at 16:04
The typical clock resolution is 10ms. Check with `CLOCKS_PER_SEC` to see if the difference you see is essentially one clock tick. Also, you shouldn't measure `printf`, but rather `sprintf` or something like that which doesn't depend on the speed of your terminal. — Kerrek SB, Feb 25 '12 at 16:11

score 2 · Answer 1 · answered Feb 25 '12 at 16:01

2

I can think of at least two possible causes:

Your clock has limited resolution.
printf will occasionally be flushing its buffer.

answered Feb 25 '12 at 16:01

Oliver Charlesworth

267,707
33
569
680

score 1 · Answer 2 · answered Feb 25 '12 at 16:04

1

Some compilers (in particular recent versions of gcc on recent Linux distributions, when optimizing with -O2) are able to optimize printf("a") into code very similar to putchar(a)

But most of the time is spent in the kernel doing the write system call.

answered Feb 25 '12 at 16:04

Basile Starynkevitch

223,805
18
296
547

1

Well the write sys call always causes a context switch, so at that point your process is waiting to be scheduled again. – Jonathon Reinhart Feb 25 '12 at 16:07
Well, magic has to be hidden somewhere inside the PC, because when trying to do this test with empty string and **puts("");**, it takes literally no time. But **printf("");** really takes some time to do its _nothing_.. – Jeyekomon Feb 25 '12 at 16:14

score -1 · Answer 3 · answered Feb 25 '12 at 17:35

man page of clock said that it returns an

approximation of processor time used by the program

This approxmation is based on the famous Time Stamp Counter. As wikipedia says :

It counts the number of cycles since reset

Sadly, nowadays, this counter can vary between core.

There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized.

So beware to lock your code on a certain cpu, otherwise, you will continue to have strange results. And since you seems to search precise results, you can use this code instead of clock call :

  uint64_t rdtsc(void) {
    uint32_t lo, hi;
    __asm__ __volatile__ (      // serialize
    "xorl %%eax,%%eax \n        cpuid"
    ::: "%rax", "%rbx", "%rcx", "%rdx");
    /* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return (uint64_t)hi << 32 | lo;
  }

There is no requirement that clock(3) has to use the TSC. It also does not count cycles, but clocks, which is an implementation-dependent value, which happens to be 1 µs in POSIX. — jørgensen, Feb 26 '12 at 04:45

Speed of printf()

3 Answers3