2

I'm learning C.

I find I learn programming well when I try things and received feedback from established programmers in the language.

I decided to write my own strcmp() function, just because I thought I could :)

int strcompare(char *a, char *b) {
    while (*a == *b && *a != '\0') {
        a++;
        b++;
    }
    return *a - *b;
}

I was trying to get it to work by incrementing the pointer in the condition of the while but couldn't figure out how to do the return. I was going for the C style code, of doing as much as possible on one line :)

Can I please get some feedback from established C programmers? Can this code be improved? Do I have any bad habits?

Thanks.

Junior Mayhé
  • 16,144
  • 26
  • 115
  • 161
alex
  • 479,566
  • 201
  • 878
  • 984
  • 2
    "Doing as much as possible on one line" is certainly a style, but not one that should be encouraged! – Oliver Charlesworth Oct 24 '10 at 12:19
  • One comment re your code, though: You should make the function arguments `const char *`. – Oliver Charlesworth Oct 24 '10 at 12:21
  • @Oli Charlesworth I agree, but it looks like a C idiom from reading through K&R. Perhaps that is one bad habit I have picked up already! – alex Oct 24 '10 at 12:21
  • 1
    @Alex: Yes, it is indeed a fairly common idiom, especially in legacy code. People vary on their opinions on this, but IMHO there's very little excuse for doing things like `while(*a++ == *b++);`! – Oliver Charlesworth Oct 24 '10 at 12:23
  • is it meant to crash on NULL? – Chris Becke Oct 24 '10 at 12:52
  • The advantage of doing everything in a single expression is simple: sequence points. The compiler is not allowed to optimize (by re-ordering) side effects over sequence points. so called 'terse' code does give c & C++ compiler's more opportunity to optimize. – Chris Becke Oct 24 '10 at 13:00
  • @Chris: If you put everything on one line, you still have to take sequence points into account! I'd be interested to see an example where terse code gave any improvement at all. – Oliver Charlesworth Oct 24 '10 at 13:27
  • 3
    @Chris: Please stop propagating the ridiculous idea that NULL should be treated as a valid argument to functions. Yes it should have undefined behavior if you pass NULL to the function, just like if you pass `(char *)1` or any other nonsense pointer. – R.. GitHub STOP HELPING ICE Oct 24 '10 at 19:44

6 Answers6

3

If you want to do everything in the while statement, you could write

while (*a != '\0' && *a++ == *b++) {}

I'm not personally a huge fan of this style of programming - readers need to mentally "unpack" the order of operations anyway, when trying to understand it (and work out if the code is buggy or not). Memory bugs are particularly insidious in C, where overwriting memory one byte beyond or before where you should can cause all sorts of inexplicable crashes or bugs much later on, away from the original cause.

Modern styles of C programming emphasize correctness, consistency, and discipline more than terseness. The terse expression features, like pre- and post-increment operations, were originally a way of getting the compiler to generate better machine code, but optimizers can easily do that themselves these days.

As @sbi writes, I'd prefer const char * arguments instead of plain char * arguments.

Doug
  • 8,780
  • 2
  • 27
  • 37
  • Thanks Doug! I thought about doing that actually. I definitely understand what you mean by having to *unpack* the order of operations. +1 – alex Oct 24 '10 at 12:32
  • I'm surprised you came up with a version of that loop condition that would work, but even after looking at it for a while I wouldn't bet anything substantial on it working under all conditions. So I wholeheartedly agree with your excellent analysis, `+1` from me. – sbi Oct 24 '10 at 13:55
2
  1. The function doesn't change the content of a and b. it should probably announce that by taking pointers to const strings.
  2. Most C styles are much terser than many other languages' styles, but don't try to be too clever. (In your code, with several conditions ANDed in the loop conditions, I don't think there's way to put incrementing in there, so this isn't even a question of style, but of correctness.)
sbi
  • 219,715
  • 46
  • 258
  • 445
  • 1
    `strcmp` only has to return greater than, equal to, or less than zero. – Oliver Charlesworth Oct 24 '10 at 12:26
  • @Oli: Ah, Ok, so I was wrong on that. (I'm a C++ developer by heart.) Fixed. – sbi Oct 24 '10 at 12:28
  • Thanks for this - just what I was looking for. Can you please elaborate on 1. I tried when `b` started with `a` (what I think you meant) but it still worked. 3. is a very good point, but I merely forgot to do it :) – alex Oct 24 '10 at 12:29
  • @alex: Two of my original four points were moot (one I found out by myself, one was found by Oli), so I deleted them. That must have been moments before you asked that question in your comment. I suppose your reference to #3 refers to what's now #1, the `const` issue. Yes, that's often forgotten, and that's sad. Using `const` strictly allows the compiler to find errors you make, and that's infinitely to be preferred over errors happening at run-time, because whatever error causes compilation to fail, has no chance on ending up on your customers' machines. – sbi Oct 24 '10 at 12:52
1

I don't know since when putting as much as possible is considered as C-style... I rather associate (obfuscated) Perl with that..

Please DO NOT do this. The best thing to do is one command per line. You will understand why when you try to debug your code :)

To your implementation: Seems quite fine to me, but I would put in the condition that *b is not '\0' either, because you can't know that a is always bigger than b... Otherwise you risk reading in unallocated memory...

apirogov
  • 1,296
  • 1
  • 12
  • 22
  • It seemed to still work when `b` had a greater length than `a` (in my tests). Am I relying on undefined behaviour? – alex Oct 24 '10 at 12:25
  • As far as I know, yes... After all you can't know how big the buffers are, unless you pass their length in as well and the only way left is to look for \0 in all of the strings, otherwise the results might be garbage... – apirogov Oct 24 '10 at 12:29
  • 1
    You only need to test for zero in the one string as, if the zero is encountered in the other string it will be a mismatch and stop iterating. – Chris Becke Oct 24 '10 at 12:56
1

You may find this interesting, from eglibc-2.11.1. It's not far different to your own implementation.

/* Compare S1 and S2, returning less than, equal to or
   greater than zero if S1 is lexicographically less than,
   equal to or greater than S2.  */
int
strcmp (p1, p2)
     const char *p1;
     const char *p2;
{
  register const unsigned char *s1 = (const unsigned char *) p1;
  register const unsigned char *s2 = (const unsigned char *) p2;
  unsigned reg_char c1, c2;

  do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0')
    return c1 - c2;
    }
  while (c1 == c2);

  return c1 - c2;
}
Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
  • Wow, how ridiculously overcomplicated. This looks like it was written for a compiler from the 80s that had no idea how to optimize. If you're going to do something fancy, at least make it useful (for instance comparing whole words at a time). – R.. GitHub STOP HELPING ICE Oct 24 '10 at 21:04
0

A very subtle bug: strcmp compares bytes interpreted as unsigned char, but your function is interpreting them as char (which is signed on most implementations). This will cause non-ascii characters to sort before ascii instead of after.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
0

This function will fail if limits of (insigned) char is equal to or greater than limits of int because of integer overflow.

For example if you compile it on DSP which have 16 bit char with limits 0...65536 and 16 bit int with limits of -32768...32767, then if you try to compare strings like "/uA640" and "A" the result will be negative, which is not true.

This is exotic and weird problem but it appear when you write universal implementation.

Vovanium
  • 3,798
  • 17
  • 23