4
strtol("0x", &endptr, 16);

This completes with endptr pointing to "0x". I expected "x". Is my C library's strtol() amiss, my expectations or something else?

The "0x" appears to be the beginning of an optional "0x" prefix, yet since it is not followed by a hex-digit, I would expect it to not qualify as a prefix and the parsed value should be from "0" with a non-numeric trailing "x". As is, strtol() implies no conversion since the end pointer points to the string beginning.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
  char *endptr = "None";
  long val;

  errno = 0;
  val = strtol("0q", &endptr, 16);
  printf("val:%ld, errno:%d, endptr:<%s>\n", val, errno, endptr);

  errno = 0;
  val = strtol("0x", &endptr, 16);
  printf("val:%ld, errno:%d, endptr:<%s>\n", val, errno, endptr);

  return 0;
}

Output:

val:0, errno:0, endptr:<q>
val:0, errno:0, endptr:<0x>  (Expected <x>)

C spec (emphasis mine):

... If the value of base is 16, the characters 0x or 0X may optionally precede the sequence of letters and digits, following the sign if present. C17dr § 7.22.1.4 3

The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. The subject sequence contains no characters if the input string is empty or consists entirely of white space, or if the first non-whitespace character is other than a sign or a permissible letter or digit. C17dr § 7.22.1.4 4

Select build output info:

Invoking: Cygwin C Compiler
gcc -std=c11 -O0 -g3 -pedantic -Wall -Wextra -Wconversion -c -fmessage-length=0 -v -MMD -MP -MF"Day1.d" -MT"Day1.d" -o "Day1.o" "../Day1.c"
Target: x86_64-pc-cygwin
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC) 
COLLECT_GCC_OPTIONS='-std=c11' '-O0' '-g3' '-Wpedantic' '-Wall' '-Wextra' '-Wconversion' '-c' '-fmessage-length=0' '-v' '-MMD' '-MP' '-MF' 'Day1.d' '-MT' 'Day1.d' '-o' 'Day1.o' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-cygwin/11/cc1.exe -quiet -v -MMD Day1.d -MF Day1.d -MP -MT Day1.d -dD -idirafter /usr/lib/gcc/x86_64-pc-cygwin/11/../../../../lib/../include/w32api -idirafter /usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/lib/../lib/../../include/w32api ../Day1.c -quiet -dumpbase Day1.c -dumpbase-ext .c -mtune=generic -march=x86-64 -g3 -O0 -Wpedantic -Wall -Wextra -Wconversion -std=c11 -version -fmessage-length=0 -o /cygdrive/c/Users/TPC/AppData/Local/Temp/ccoGk5b2.s
GNU C11 (GCC) version 11.2.0 (x86_64-pc-cygwin)
    compiled by GNU C version 11.2.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • The "sequence of letters and digits" is empty – so the conversion is invalid. It is reasonable to return the pointer to the `0` of `0x` because the input string was mal-formed. You might get the pointer to the `x` if the base is `0` instead of `16`. – Jonathan Leffler Mar 01 '22 at 20:06
  • @JonathanLeffler Confident base 0 or base 16 makes no difference here. – chux - Reinstate Monica Mar 01 '22 at 20:08
  • MSVC gives the same result and their [man page](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/strtol-wcstol-strtol-l-wcstol-l?view=msvc-170) says "If the first character is '0' and the second character is 'x' or 'X', the string is interpreted as a hexadecimal integer." Base 0 and 16 behave the same. – Weather Vane Mar 01 '22 at 20:09
  • 1
    If it's any help, on macOS (Big Sur, 11.6.3), the program prints `val:0, errno:0, endptr:` — `val:0, errno:0, endptr:` as you expected. And choosing the base 0 instead of 16 makes no difference. I think it is mildly ambiguous — neither is unequivocally wrong. – Jonathan Leffler Mar 01 '22 at 20:10
  • To be safe, you should set `errno = 0;` before each call to `strtol()`, though as `errno` is not set to a non-zero value, you're OK as it stands. No function in the standard C library sets `errno` to zero; that is normally a behaviour copied by additional library functions. – Jonathan Leffler Mar 01 '22 at 20:14
  • @JonathanLeffler OK, `errno = 0` before both to set aside that concern. – chux - Reinstate Monica Mar 01 '22 at 20:15
  • I get the macOS behaviour on a Linux box too — RHEL 7.4 happens to be the machine I used (that is, the `endptr` points to `x`, not `0`, as you expected). – Jonathan Leffler Mar 01 '22 at 20:25
  • 1
    I get `endptr` pointing to `x` on CentOS Stream 8 (Glibc 2.28). Although I agree that the spec could be read multiple ways here, it sounds like there is consistent behavior across a pretty good sampling of the widely used C implementations. On what implementation does `endptr` get set to point to `0`? – John Bollinger Mar 01 '22 at 20:29
  • 1
    @JonathanLeffler I like your idea of "neither is unequivocally wrong", yet the `"0x"` is an _optional_ prefix and the sequence is based on the _longest_ that fits the form. `"0x"` does not fit the form: 0, x or X, _hex digits_, but does fit the form of _digit_. IAC, given competing choices, I’d expect the parsing to favor a successful conversion and not a failure. – chux - Reinstate Monica Mar 01 '22 at 20:35
  • @JohnBollinger Is your answer in the _Select build output info_ part of the post? _gcc/x86_64-pc-cygwin/11_ .... If not please advise how to determine. – chux - Reinstate Monica Mar 01 '22 at 20:38
  • Sorry, @chux-ReinstateMonica, I overlooked that bit. It does leave me curious about which C library is being employed, however. I don't see anything I recognize as a C standard library package on [Cygwin's package list](https://www.cygwin.com/packages/package_list.html) or on its [gcc package](https://www.cygwin.com/packages/summary/gcc-core.html)'s dependency list. I guess it may be using the MS C library, but the behavior you describe is inconsistent with the MS docs. – John Bollinger Mar 01 '22 at 20:50
  • @JohnBollinger Given [glibc not supported by Cygwin](https://stackoverflow.com/q/25952829/2410359) and [get Cygwin DLL version](https://stackoverflow.com/q/38675362/2410359) I think I am using cygwin-3.3.4-2 - a fairly recent version. – chux - Reinstate Monica Mar 01 '22 at 21:08
  • gcc Linux/glibc gives me `endptr:` but my pretty old gcc Mingw64/MS CRT gives me `endptr:<0x>`. This appears to be a bug in the latter. I also found someone asking this very question back in 2016 - I believe the answer there is correct so I'll close this as a dupe. @chux-ReinstateMonica even commented on that answer back then :) – Lundin Mar 03 '22 at 11:45
  • @Lundin I looked, yet did not find that old post. Somehow it lurked in the back of my head that this was done before. Thanks for for finding it. – chux - Reinstate Monica Mar 03 '22 at 12:26

0 Answers0