Can someone explain why this function is the way it is?

Question

I was looking at functions from GNU C library, glibc-2.18, and this is the code I found for strncmp.c

Looking at it I don't understand why it's written this way. Is this loop unrolling? Why not use 5 or 10 instead of 4? And why did they write like this instead of using a more straightforward approach?

/* Compare no more than N characters of S1 and S2,
   returning less than, equal to or greater than zero
   if S1 is lexicographically less than, equal to or
   greater than S2.  */
int
STRNCMP (const char *s1, const char *s2, size_t n)
{
  unsigned char c1 = '\0';
  unsigned char c2 = '\0';

  if (n >= 4)
    {
      size_t n4 = n >> 2;
      do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
        return c1 - c2;
    } while (--n4 > 0);
      n &= 3;
    }

  while (n > 0)
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0' || c1 != c2)
    return c1 - c2;
      n--;
    }

  return c1 - c2;
}

Can someone explain the logic behind the code?

Thanks.

Yes, this looks like loop unrolling done by hand. The idea is that it will be faster, because of fewer tests and jumps. — Thomas Padron-McCarthy, Dec 15 '13 at 08:40

Bernd Elkemann · Answer 1 · 2013-12-15T09:52:51.890

2

Yes it is a partially unrolled loop.

The 4:

1) The general size: There is always a trade-off between more branching (small number) and larger code-size (large number).

2) The specific size (4, not 5): As Joachim Isaksson points out the >> before the loop would need to be changed to a division if chosing 5 instead of 4. He says that on some (eg embedded) CPU's that becomes significant. (Often we only think about what the loop costs, not its setup, but he is probably right because most strings tend to be small!)

Note that if you wanted to unroll a loop to any number (regardless if power of 2 or not) then you could - instead of doing a division and maintaining the n4 - you could compute the end-pointer of s1 and check s1 versus it at the end:

char* s1end = s1 + n;
do {...} while(s1 < s1end);

edited Dec 15 '13 at 09:52

answered Dec 15 '13 at 08:43

Bernd Elkemann

23,242
4
37
66

The reason for 4 instead of 5 is most likely that divide/mod by 4 is trivial to calculate, while division by 5 isn't necessarily on all machines. – Joachim Isaksson Dec 15 '13 at 08:48
@JoachimIsaksson there is no division in the loop – Bernd Elkemann Dec 15 '13 at 08:49
No, but a division by 4 `size_t n4 = n >> 2;` and a mod operation `n &= 3;` outside the loop. As a comparison, division by 5 on a CPU without a dedicated division instruction would probably completely negate all speed gain. – Joachim Isaksson Dec 15 '13 at 09:00
@JoachimIsaksson You are right, if they were thinking about that kind of CPU that is probably the reason for the 4. – Bernd Elkemann Dec 15 '13 at 09:36

score 1 · Answer 2 · answered Dec 15 '13 at 08:43

1

This way there are fewer branches (if statements) and it allows for the compiler to optimize for pipe-lining.

answered Dec 15 '13 at 08:43

randomusername

7,927
23
50

score 1 · Answer 3 · answered Dec 15 '13 at 08:44

Yes this is loop unrolling.

Why 4 and not 5 or 10. Probably because this is intended to be cache friendly and make sure the calls are aligned to memory pages/rows.

Question could be why not 8, 16 or 32. Maybe code size, maybe it's easier to generate optimized assembly for some older processors when it's 4. Or the actual code would be too large to fit typical instruction cache.

Can someone explain why this function is the way it is?

3 Answers3