1

I need to compare some char * (which I know the length of) with some string literals. Right now I am doing it like this:

void do_something(char * str, int len) {
  if (len == 2 && str[0] == 'O' && str[1] == 'K' && str[2] == '\0') {
    // do something...
  }
}

The problem is that I have many comparisons like this to make and it's quite tedious to break apart and type each of these comparisons. Also, doing it like this is hard to maintain and easy to introduce bugs.

My question is if there is shorthand to type this (maybe a MACRO).

I know there is strncmp and I have seen that GCC optimizes it. So, if the shorthand is to use strncmp, like this:

void do_something(char * str, int len) {
  if (len == 2 && strncmp(str, "OK", len) == 0) {
    // do something...
  }
}

Then, I would like to know it the second example has the same (or better) performance of the first one.

  • the second example shouldn't compile because `strncmp` takes a length argument that you did not pass in. – Christian Gibbons Jul 09 '20 at 17:03
  • Fixed. Thank you! –  Jul 09 '20 at 17:05
  • In the second, if you need to check the length then you can use `strcmp` (instead of `strncmp`) which would implicitly do that without the need for `len == 2` test. – P.P Jul 09 '20 at 17:07
  • I think do `strcmp(str, "OK");` and let the compiler doing possible optimization – bruno Jul 09 '20 at 17:08
  • @TertulianoMáximoAfonso If you already know the length (and are testing that it equals 2), then in your first example, why do you bother to also test the `'\0'` at the end? – alani Jul 09 '20 at 17:11
  • Would the compiler still make a function call? –  Jul 09 '20 at 17:11
  • 1
    Only use strcmp if you know both strings are null-terminated. – stark Jul 09 '20 at 17:12
  • @alaniwi case the char* is "OK1" and someone gives the wrong length. This checks that it is a valid C string. Both input and length are function parameters. –  Jul 09 '20 at 17:13
  • @stark I am using strncmp. –  Jul 09 '20 at 17:14
  • @TertulianoMáximoAfonso So if you do not trust the length to be correct, do not use it. – alani Jul 09 '20 at 17:14
  • Why? Should I not validate that the input is valid? –  Jul 09 '20 at 17:16
  • @TertulianoMáximoAfonso But the two examples are not equivalent. In the first case you are testing that the string does not continue with more (non-null) characters after the declared length of 2, in the case with `strncmp` you do not test this. – alani Jul 09 '20 at 17:21
  • I thought strncmp would do this validation. So to validate it, should I use something like if (strncmp(str, "OK\0", len+1) == 0) { ... } ? –  Jul 09 '20 at 17:23
  • @TertulianoMáximoAfonso No you do not need to add an extra `\0`. The `"OK"` is already a null-terminated string constant. The length you use with `strncmp` would normally be set based on the length of storage used for the two strings that you will be comparing (the shorter of whichever buffer contents are not guaranteed to be null-terminated). – alani Jul 09 '20 at 17:31
  • Of course, sorry. But how do I validate it using only strncmp? Because I just tested and strncmp("OK", 2) == strncmp("OK1", 2) == 0. Should then I do something like this: if (strncmp(str, len) == 0 && str[len] == '\0') { ... } –  Jul 09 '20 at 17:36
  • You're doing a while bunch of extra work. Just compare up to and including the NUL. I've posted an answer. – Mad Physicist Jul 10 '20 at 14:43

2 Answers2

1

Yes it will. However, your code is not comparing a char * to a string literal. It is comparing two string literals. The compiler is smart enough to spot this and optimize all the code away. Only the code inside the if block remains.

We can see this by looking at the assembly code generated by the comiler:

cc -S -std=c11 -pedantic -O3 test.c

First with your original code...

#include <stdio.h>
#include <string.h>

int main() {
    unsigned int len = 2;
    char * str = "OK";
    if (len == 2 && strncmp(str, "OK", len) == 0) {
      puts("Match");
    }
}

Then with just the puts.

#include <stdio.h>
#include <string.h>

int main() {
    //unsigned int len = 2;
    //char * str = "OK";
    //if (len == 2 && strncmp(str, "OK", len) == 0) {
      puts("Match");
    //}
}

The two assembly files are practically the same. No trace of the strings remains, only the puts.

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 14    sdk_version 10, 14
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
    .cfi_startproc
## %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    leaq    L_.str(%rip), %rdi
    callq   _puts
    xorl    %eax, %eax
    popq    %rbp
    retq
    .cfi_endproc
                                        ## -- End function
    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz  "Match"


.subsections_via_symbols

This is a poor place to focus on optimization. String comparison against small strings is very unlikely to be a performance problem.

Furthermore, your proposed optimization is likely slower. You need to get the length of the input string, and that requires walking the full length of the input string. Maybe you need that for other reasons, but its an increasing edge case.

Whereas strncmp can stop as soon as it sees unequal characters. And it definitely only has to read up to the end of the smallest string.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 1
    That's because you are assuming that I know str at compile time. The example serves only to illustrate the problem. str is not set at compile time. It comes from a user. Then I "parse" it to know the appropriate action. –  Jul 09 '20 at 17:20
  • @TertulianoMáximoAfonso Please provide a more clear example. – Schwern Jul 09 '20 at 17:21
  • I think it was pretty clear. You are the one that taught that the example was my whole program and compiled the whole thing. That is a really bad assumption. An example is an example. –  Jul 09 '20 at 17:26
  • Anyway I added a note to question. –  Jul 09 '20 at 17:30
  • @TertulianoMáximoAfonso If you want good answers, you have to provide a clear example of what we're working with. You have to put in the work to make your question as clear as possible. For example, instead of a note, you could replace `char * str = "OK";` with a call to `fgets`. Then there is no confusion. It's important that your question is clear to other people. – Schwern Jul 09 '20 at 17:35
  • Also, about your answer. In the first example, as soon as there is an unequal char, the condition cannot be true and the branch is not taken. –  Jul 09 '20 at 17:48
  • @TertulianoMáximoAfonso Thank you for updating the question, now it might be reopened. I'm sorry, I don't understand what your comment is referring to or why you've mentioned it, could you explain? The first example is your original code. – Schwern Jul 09 '20 at 17:56
  • I am referring to this false statement: "Furthermore, your proposed optimization is likely slower. You need to get the length of the input string, and that requires walking the full length of the input string. Maybe you need that for other reasons, but its an increasing edge case." –  Jul 09 '20 at 18:04
  • The len is given to me and I use it both when calling strncmp and when doing my comparison. As I say in my question "(which I know the length of)" –  Jul 09 '20 at 18:17
  • @TertulianoMáximoAfonso Yes, for the special case where you already know the length of both strings you can optimize the comparison by comparing their lengths. That's likely to do more than unrolling the comparison loop. However, unless both strings have long, equal prefixes, the optimization is unlikely to matter because `strncmp` will stop as soon as it sees a difference. If one string is short, and constant strings will be short, the optimization isn't likely to be worthwhile. It might be slower as `strncmp` is highly optimized. Do some benchmarking. – Schwern Jul 09 '20 at 18:23
  • If the string is shorter then it is not equal and the compression is false as soon as one string ends... –  Jul 09 '20 at 18:26
  • @TertulianoMáximoAfonso [Here's some benchmark code](https://gist.github.com/schwern/08d15f08d634b312536afa754867c29c) to play with. – Schwern Jul 09 '20 at 18:58
0

Your example implies that your strings are always NUL terminated. In that case, don't bother getting their length ahead of time, since that involves searching for the NUL. Instead, you can do

memcmp(str, "OK", 3);

This way, the NULs get compared too. If your length is > 2, the result will be > 0 and if it's shorter, the result will be < 0.

This is a single function call, and memcmp is virtually guaranteed to be better optimized than your hand-written code. At the same time, don't bother optimizing unless you find this code to be a bottleneck. Keep in mind also that any benchmark I run on my machine will not necessarily apply to yours.

The only real reason to make this change is for readability.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Also. About getting the length ahead of time. I am reading these chars from a serial device, so I do have a counter for them and thus it's length. Would it be faster if a did something like if (length == 2 && memcmp(str, "OK", 3)) { ... }? –  Jul 10 '20 at 16:07
  • @TertulianoMáximoAfonso. It would be slower if the length usually matches, faster if they don't. – Mad Physicist Jul 10 '20 at 19:13
  • Makes sense. Since I have lots of these comparisons and only one of them matches I will be doing this check. Thank you ;) –  Jul 10 '20 at 19:47