0

The strncmp() function really only has one use case (for lexicographical ordering):

One of the strings has a known length, the other string is known to be NUL terminated. (As a bonus, the string with known length need not be NUL terminated at all.)

The reasons I believe there is just one use case (prefix match detection is not lexicographical ordering):‡ (1) If both strings are NUL terminated, strcmp() should be used, as it will do the job correctly; and (2) If both strings have known length, memcmp() should be used, as it will avoid the unnecessary check against NUL on a byte per byte basis.

I am seeking an idiomatic (and readable) way to use the function to lexicographically compare two such arguments correctly (one of them is NUL terminated, one of them is not necessarily NUL terminated, with known length).

Does an idiom exist? If so, what is it? If not, what should it be, or what should be used instead?

Simply using the result of strncmp() won't work, because it will result in a false equality result in the case that the argument with known length is shorter than the NUL terminated one, and it happens to be a prefix. Therefore, extra code is required to test for that case.

As a standalone function I don't see much wrong with this construction, and it appears idiomatic:

/* s1 is NUL terminated */
int variation_as_function (const char *s1, const char *s2, size_t s2len) {
    int result = strncmp(s1, s2, s2len);
    if (result == 0) {
        result = (s1[s2len] != '\0');
    }
    return result;
}

However, when inlining this construction into code, it results in a double test for 0 when equality needs special action:

int result = strncmp(key, input, inputlen);
if (result == 0) {
    result = (key[inputlen] != '\0');
}
if (result == 0) {
    do_something();
} else {
    do_something_else();
}

The motivation for inlining the call is because the standalone function is esoteric: It matters which string argument is NUL terminated and which one is not.

Please note, the question is not about performance, but about writing idiomatic code and adopting best practices for coding style. I see there is some DRY violation with the comparison. Is there a straightforward way to avoid the duplication?


† By known length, I mean the length is correct (there is no embedded NUL that would truncate the length). In other words, the input was validated at some earlier point in the program, and its length was recorded, but the input is not explicitly NUL terminated. As a hypothetical example, a scanner on a stream of text could have this property.
‡ As has been pointed out by addy2012, strncmp() could be used for prefix matching. I as focused on lexicographical ordering. However, (1) If the length of the prefix string is used as the length argument, both arguments need to be NUL terminated to guard against reading past an input string shorter than the prefix string. (2) If the minimum length is known between the prefix string and the input string, then memcmp() would be a better choice in terms of providing equivalent functionality at less CPU cost and no loss in readability.

Community
  • 1
  • 1
jxh
  • 69,070
  • 8
  • 110
  • 193
  • what are you trying to do? `strncmp` compares the first n characters.. you're doing something different... – Karoly Horvath May 05 '15 at 23:27
  • 1
    why do you inline the code? that's bad practice... – Karoly Horvath May 05 '15 at 23:29
  • @KarolyHorvath: I addressed the reason for inline in the question, thanks. – jxh May 05 '15 at 23:40
  • Idiomatic use of strncmp is when any of two provided strings are known to be: a) null-terminated; b) a fixed buffer of size n that *may* be null-terminated. If you are curious, strncpy/stpncpy generate buffers of that kind. Most people are scared of optionally terminated buffers though. – user3125367 May 06 '15 at 00:17
  • @user3125367: Idiomatic use of `strncpy()` is actually more problematic. – jxh May 06 '15 at 00:39
  • You just look on it from the perspective that is entirely problematic. – user3125367 May 06 '15 at 01:02
  • @user3125367: If you have knowledge of an idiomatic `strncmp()` usage different than what I have here, could you post an answer with examples? – jxh May 06 '15 at 01:47
  • @jxh: That motivation isn't very convincing, I'm afraid. It doesn't explain the need for manual inlining. – Lightness Races in Orbit May 21 '15 at 11:35
  • @Lightness: It is mentioned in the post. The function call is less clear than inlined code, because it is not clear that the first argument to the function must be NUL terminated. – jxh May 21 '15 at 13:51
  • @jxh: I said that I think what you mentioned doesn't explain it. So the response "I mentioned it" doesn't help! Why is it "not clear"? You documented it right there above the function. I still see no connection between that parameter constraint and manual inlining. – Lightness Races in Orbit May 21 '15 at 14:39
  • @Lightness: I see your point. If you feel that the comment on the function is sufficient for code clarity, then you will not be motivated to manually inline the call. Many users of comparison functions will reverse arguments to reverse the sense of the comparison. – jxh May 21 '15 at 14:59
  • @jxh: If they do so without reading the documentation for that function, then they are (pardon my language) _stupid_. And deserve every bit of pain they get. And no job in _my_ team! This is no reason to butcher your code into a mess of duplication and inexpressiveness. Just leave it as a function, seriously. This is what functions are for. – Lightness Races in Orbit May 21 '15 at 15:14
  • @LightnessRacesinOrbit: I am not opposed to using functions. I would have rather had the asymmetric parameter requirement be enforceable in some way, though. I was hoping an idiomatic way of manually inlining it would lead to more correct code, as well as more clarity, as the idiom would be self-reinforcing. However, perhaps I will make the second string a single argument (like an `iovec`), and that will be that. – jxh May 21 '15 at 16:32
  • @jxh: I honestly don't see the problem with simply having your function, giving its parameters meaningful names and documenting its behaviour. You should be doing that _anyway_, so what does unfunctioning it win you? – Lightness Races in Orbit May 21 '15 at 17:33
  • @LightnessRacesinOrbit: You continue to repeat this same point which I have already conceded. Or is there a further point you are trying to make? – jxh May 21 '15 at 17:40
  • @jxh: No I guess we're going in circles. I guess I'm trying to hammer home that even making it enforceable is too much. That would be extremely expensive (actually, computationally infeasible if you think about it) to implement. C and C++ have _always_ preferred, idiomatically, that we _document_ pre-conditions rather than checking them at compile-time or even runtime. Granted, sometimes this pattern is broken (`std::vector::at`, Concepts), but this is the exception rather than the rule. – Lightness Races in Orbit May 21 '15 at 17:41
  • @LightnessRacesinOrbit: Your point is well taken! I believe that an enforceable API is superior to one that is not, and is especially useful for ones that are easy to abuse and for which the abuse is difficult to detect in a code review or static analysis. Compile time enforceability is best, and can always be left enabled. If the enforcement is runtime, it may not be always enabled, but at least its availability is better than nothing. I believe my suggestion of using an `iovec` parameter helps toward compile time enforceability, as well as with spotting abuse during code review. – jxh May 21 '15 at 17:55

2 Answers2

4

The strncmp() function really only has one use case:

One of the strings has a known length, the other string is known to be NUL terminated.

No, you can use it to compare the beginnings of two strings, no matter if the length of any string is known or not. For example, if you have an array / a list with last names, and you want to find all which begin with "Mac".

adjan
  • 13,371
  • 2
  • 31
  • 48
  • You are assuming both arguments are NUL terminated. – jxh May 05 '15 at 23:56
  • @jxh: `memcmp` can only be used if it is safe to read past a NUL terminator. Otherwise `strncmp` and `memcmp` are identical. – Chris Dodd May 05 '15 at 23:56
  • @ChrisDodd: `memcmp()` does not do a NUL check on each byte iteration. `memcmp()` is better when you know the lengths of both arguments. – jxh May 05 '15 at 23:57
  • @jxh: exactly -- it does no NUL check, so it might read past a NUL on the string(s). `strncmp` is safer in that it will never read past a NUL. – Chris Dodd May 05 '15 at 23:58
  • @ChrisDodd: I am not disputing that `strncmp()` can be used as a prefix matching function. But, the use case described here assumes both arguments are NUL terminated. If one of them is not, `strncmp()` doesn't help you from potentially reading over a buffer. – jxh May 06 '15 at 00:01
  • @ChrisDodd: If we consider that one of the arguments is not NUL terminated, then we are either left with my use case again, or the case where both lengths are known, in which case `memcmp()` is better. – jxh May 06 '15 at 00:03
  • @jxh: `strncmp` will never read more than the specified number of characters from either string, so will work fine even if neither string is NUL terminated. If the strings are NUL terminated and happen to be identical (but of unknown length, shorter than the buffer length), `memcmp` will compare data in the buffer after the end of the strings. – Chris Dodd May 06 '15 at 00:07
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/77055/discussion-between-jxh-and-chris-dodd). – jxh May 06 '15 at 00:09
  • @ChrisDodd: If neither string is NUL terminated, their lengths must be known, in which case, `memcmp()` will be the better choice. For the prefix matching problem, assuming reading past the end of an input causes undefined error, either the inputs are NUL terminated (so that an input shorter than "Mac" will correctly stop `strncmp()`), or the input lengths are known, in which case `memcmp()` is the better choice, since "Mac" also has a known length. – jxh May 06 '15 at 00:51
  • @ChrisDodd: Your point about the known length being wrong (embedded NUL in the input) is a valid one, since there is no accounting for bugs. – jxh May 06 '15 at 02:58
  • I am getting the impression that the only use case for `strncpy` is for prefix matching on two NUL terminated strings (and known prefix length). If you can come up with an argument as to why my use case is unreasonable, I would be happy to select this as the best answer. – jxh May 06 '15 at 22:26
  • Another usage case is for comparing a null-terminated string with a null-padded string stored in a fixed-length field. The latter pattern is perhaps not as common as it used to be, but it's not so obscure as some descriptions of `strncpy` (the proper function to use when converting from a null-terminated to a null-padded string) seem to claim. – supercat Jun 23 '16 at 20:24
  • @jxh: What I really wish C included, though, would be a function which was like memcpy, but which returned the offset of the first mismatch. A quicksort routine that knows all the items within a partition are identical in the first 100 characters could save a fair bit of time versus one that needs to include those characters in every string comparison performed within the partition. – supercat Jun 23 '16 at 20:27
0

In fact, strncmp should generally be used in preference to strcmp unless you know absolutely know that both strings are well-formed and nul-terminated.

Why? Because otherwise you have a vulnerability to buffer overflows.

This rule is unfortunately not followed often.

There are a lot of buffer overflow errors.

Update

I think the core error here is in "one of the strings has a known length". No C string has a known length a priori. They're not like Pascal or Java strings, which are essentially a pair of (length, buffer). A C string is by definition a char[] identifying a chunk of memory, with the distinguished symbol \0 to identify the end. strncmp, strncpy etc exist to protect against attempts to use a chunk of memory as a string that is not well-formed.

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • Not both — only one of the strings needs to be null-terminated. The comparison will stop as soon as two bytes are found to be different. – r3mainer May 05 '15 at 23:41
  • "...should generally be used in preference..." Er... And where "generally" are you going to obtain the value of `n` for `strncmp`? – AnT stands with Russia May 05 '15 at 23:55
  • 1
    @AnT: from the size of the buffers holding the strings. If the buffers are of different sizes, you need to use the minimum. – Chris Dodd May 06 '15 at 00:09
  • @ChrisDodd void foo(char *a, char *b) { ... } – print the size of them buffers, please. – user3125367 May 06 '15 at 00:26
  • @user3125367: You need to either have a global buffer size used for all buffers in the program, or pass the buffer size as an argument. If you don't, your program is just a bug waiting to happen. – Chris Dodd May 06 '15 at 00:37
  • @ChrisDodd I don't, actually, unless there is real need for the buffer size. Can you please link to source codes of successful C projects using that approach for virtually all strings? (Not including python, perl, pascal, etc. ofc.) – user3125367 May 06 '15 at 00:50
  • Or you need to have a bound on the buffer size that you set arbitrarily. As to successful projects using this technique, better you should google "buffer overflow exploits" or read this note: http://www.eecis.udel.edu/~bmiller/cis459/2007s/readings/buff-overflow.html That the bug is ubiquitous doesn't make it less troublesome. – Charlie Martin May 06 '15 at 01:17
  • 3
    "[`strncpy`? just say no](http://blog.liw.fi/posts/strncpy/)" is a great explanation why not to use `strnXXX()` functions so commonly, and "[Stating that something is harmful sometimes is harmful](https://blog.flameeyes.eu/2007/08/stating-that-something-is-harmful-sometimes-is-harmful)" is another good read. –  May 06 '15 at 02:26
  • Also worth mentioning are the nonstandard `strlcpy()` and `strlcat()` functions that avoid buffer overflows by truncating copied strings if you don't have enough space, making your program [arguably incorrect](https://www.sourceware.org/ml/libc-alpha/2000-08/msg00058.html). Ultimately, there is no "one size fits all" solution when it comes to string handling. –  May 06 '15 at 02:26
  • By "known length", what is meant is that the input was validated in some earlier point in the program, the length was recorded along with the input, but the input was never explicitly NUL terminated. It is a string in the abstract sense (it consists of non NUL characters), not the strict C language sense (a NUL terminated byte sequence). As you pointed out, there are standard C functions that have `str*` in their name, but can generate output without NUL termination by design. – jxh May 06 '15 at 15:41