16

According to http://www.cplusplus.com/reference/cstdlib/strtol/ this function has a signature of long int strtol (const char* str, char** endptr, int base).

I wonder, though: If it gets passed a const char * to the beginning of the string, how does it manage to turn that into a non-const pointer to the first unprocessed character without cheating? What would an implementation of strtol look like that doesn't perform a const_cast?

Enno
  • 1,736
  • 17
  • 32

4 Answers4

9

How do you implement strtol under const-correctness?

You don't, because strtol's definition is inherently not const-correct.

This is a flaw in the C standard library.

There are several standard functions that take a const char* argument (expected to point the beginning of a character array) and give back a non-const char* pointer that can be used to modify that array.

strchr is one example:

char *strchr(const char *s, int c);

For example:

#include <string.h>
int main(void) {
    const char *s = "hello";
    char *ptr = strchr(s, 'h');
    *ptr = 'H';
}

This program has undefined behavior. On my system, it dies with a segmentation fault.

The problem doesn't occur in strchr itself. It promises not to modify the string you pass to it, and it doesn't. But it returns a pointer that the caller can then use to modify it.

The ANSI C committee, back in the late 1980s, could have split each such function into two versions, one that acts on const character arrays and another for non-const arrays:

char *strchr(char *s, int c);
const char *strcchr(const char *s, int c);

But that would have broken existing pre-ANSI code, written before const existed. This is the same reason C has not made string literals const.

C++, which inherits most of C's standard library, deals with this by providing overloaded versions of some functions.

The bottom line is that you, as a C programmer, are responsible for not modifying objects you've defined as const. In most cases, the language helps you enforce this, but not always.

As for how these functions manage to return a non-const pointer to const data, they probably just use a cast internally (not a const_cast, which exists only in C++). That's assuming they're implemented in C, which is likely but not required.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
2

Most likely it just uses casting.

There are numerious functions that have this same property in standard library. Sacrifiying type safety over simplicity is likely reason, since you cannot overload functions as in C++.

They expect that programmer takes responsibility, and doesn't edit endptr if str is, for example, a string literal.

With it's limited type system, C is practical language for practical people.

user694733
  • 15,208
  • 2
  • 42
  • 68
1

strtol does do a const_cast (or equivalent). Casting a const away is not a problem, using the resulting pointer to modify the originally-const pointee may be.

But strtol just returns this pointer to you without tampering with it, so everything is fine.

Quentin
  • 62,093
  • 7
  • 131
  • 191
  • 1
    Not everything is fine. It's possible to pass a pointer to a const array to `strtol`, and use the resulting `*endptr` pointer to (attempt to) modify that array, all without using a cast. It's a flaw in the C standard library. – Keith Thompson Jan 13 '16 at 18:17
  • @KeithThompson a tradeoff has been made, and a bit of type-safety has been chipped away in favour of a better interface. Still, from the standard's perspective, everything is fine indeed : no UB or other dangers. – Quentin Jan 13 '16 at 21:28
  • Look at the example program in my answer. The function itself doesn't violate `const` safety, but it enables user code that uses it to do so without a cast or a warning. A better interface would provide two different functions, one for `const` data and one for non-`const` data. – Keith Thompson Jan 13 '16 at 22:04
  • 1
    @KeithThompson I did understand what you meant. C has no function overloading, so two differently named functions would have had to be created. In hindsight I don't think this `const`ness loophole is better than two separate functions, but the choice that has been made does make sense. – Quentin Jan 13 '16 at 22:07
  • Which is why I suggested using two functions with different names. My point is this: you say "everything is fine", but it isn't. The standard library creates a loophole that makes it too easy to break `const` correctness. (The lack of `const`ness for string literals is a similar loophole.) – Keith Thompson Jan 13 '16 at 22:13
  • @KeithThompson My answer is just : `const_cast` is not a cheat, and would be the right tool to achieve `strtol`'s interface. Whether such an interface is a good idea in the first place is both out of scope of the question, and a point we both agree on (and our answer is "no"). – Quentin Jan 13 '16 at 22:19
  • `const_cast`? In C? But my point is that your statement that "everything is fine" seems to go beyond the scope of the question. Sure, the `const char*` pointer to `char*` inside `strtol()` is the sensible thing to do. If you removed the last 4 words of your answer I'd have no objection. – Keith Thompson Jan 13 '16 at 23:06
  • @KeithThompson "`const_cast`" to echo the question's wording, "or equivalent" to acknowledge it's not actually a `const_cast` because we're in C. – Quentin Jan 13 '16 at 23:08
  • 1
    Good points made here. The way I understand Quentin, it's because as a caller to strol, I might have a `char *` and want to get an offset into it that I can write to, but if strtol was defined to work on a `char *`, it wouldn't accept a `const char *` as its first argument. You can't have it both ways, because there's no function overloading, and writing two separate functions is just making the libc even bigger for very little gain. Thanks, my understanding has broadened! – Enno Jan 16 '16 at 11:29
1

How do you implement strtol under const-correctness?

Use of C11 _Generic would allow code to call either of

// when passed argument for `str` is `char *` and for `endptr` is `char **`
long strotol(const char* str, char** endptr, int base);
// or
// when passed argument for `str` is `const char *` and for `endptr` is `const char **`
long strotol_c(const char* str, const char** endptr, int base);
// and warn/error otherwise

An implementation, as below would be identical as only the function signature preservation is needed. Since this differs from strtol(), it should be called something else such as strtol_s().

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

long int strtol_c(const char * restrict nptr, const char ** restrict endptr, int base) {
  return strtol((char *) nptr, (char **) endptr, base);
}

#define strtol_s(n,e,b) _Generic(n, \
  char *: strtol((n), (e), (b)), \
  const char *: strtol_c((n), (e), (b)), \
  default: 0 \
  )

int main(void) {
  char *src = malloc(100);
  strcpy(src, "456");
  const char *srcc = "123";
  char *endptr;
  const char *endcptr;
  long L[6] = { 0 };

  // OK - matching str and *endptr
  L[0] = strtol_s(src,  &endptr, 0);

  // warning: passing argument 2 of 'strtol' from incompatible pointer type
  L[1] = strtol_s(src,  &endcptr, 0);

  // warning: passing argument 2 of 'strtol_c' from incompatible pointer type
  L[2] = strtol_s(srcc, &endptr, 0);

  // OK - matching str and *endptr
  L[3] = strtol_s(srcc, &endcptr, 0);

  L[4] = strtol(src, &endptr, 0);
  // warning passing argument 2 of 'strtol' from incompatible pointer type

  // OK
  L[5] = strtol(src, &endcptr, 0);
  return !L[0];
}

What is lost: strtol_s() is not a true function, so a pointer to it can not be made.

how does it manage to turn that into a non-const pointer to the first unprocessed character without cheating?

strtol(), although it takes a char **endptr as the second argument, does not modify *endptr.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256