16

Question

Is there any way to get the maximum size of any string correlated with errno at compile time (at preprocessor time would be even better)? E.g. an upper bound on strlen(strerror(errno))?

My Thoughts

The best I can think of is running a program to do a brute-force search over the range of an int, over each locale, to get the string associated with each {errno, locale} pair, get its size, and generate a header on that system, then hooking that into e.g. a makefile or autoconf or whatever. I can't think of a better way to do it, but it seems ridiculous that it would be so: the standard library for a system has that information built-in, if only implicitly. Is there really no good way to get that information?

Okay, I'll admit the C and/or C++ standards might permit for error strings generated at runtime, with e.g. specific-to-circumstance messages (e.g. strerror(EINVAL) giving a string derived from other runtime metadata set when errno was last set, or something) - not sure if that is allowed, and I'd actually welcome such an implementation, but I've never heard of one existing which did so, or had more than one string for a given {errno, locale} pair.

Motivation

For context, what I specifically wanted (but I think this question is valuable in a more general way, as was discussed amongst the comments) that led to this question was to be able to use the error string associated with errno in the syscall/function writev. In my specific usecase, I was using strings out of argv and errno-linked strings. This set my "worst-case" length to ARG_MAX + some max errno string length + size of a few other small strings).

Every *nix document I've consulted seems to indicate writev will (or "may", for what little good that difference makes in this case) error out with errno set to EINVAL if the sum of the iov_len values overflows SSIZE_MAX. Intuitively, I know every errno string I've seen is very short, and in practice this is a non-issue. But I don't want my code mysteriously failing to print an error at all on some system if it's possible for this assumption to be false. So I wrote code to handle such a case - but at the same time, I don't want that additional code being compiled in for the platforms which generally clearly don't need it.

The combined input of the answers and comments so far is making me lean towards thinking that in my particular use-case, the "right" solution is to just truncate obscenely long messages - but this is why I asked the question how I did initially: such information would also help select a size for a buffer to strerror_r/strerror_s (*nix/Windows respectively), and even a negative answer (e.g. "you can't really do it") is in my view useful for others' education.

Related

This question contains answers for the strings given by strerror_r on VxWorks, but I don't feel comfortable generalizing that to all systems.

mtraceur
  • 3,254
  • 24
  • 33
  • 2
    C and C++ are different languages! – too honest for this site Mar 30 '16 at 20:20
  • 9
    @Olaf While I generally appreciate your efforts in making this distinction clear, I believe that in this case, speaking of C and C++ together is legit as the `errno` concept is used by both of them alike. Choosing either tag over the other one would be arbitrary. – 5gon12eder Mar 30 '16 at 20:25
  • Wouldn't it also depend on the locale being used? – Mark Nunberg Mar 30 '16 at 20:28
  • @5gon12eder: I'm afraid I have to disagree. The `errno` & related are part of the C compatibility interface and the libc, thus not native C++/libc++. Said that, please notice the question explicitly references the C standard. – too honest for this site Mar 30 '16 at 20:30
  • 1
    Can you choose your own max, and clip the error messages at runtime? – xrgb Mar 30 '16 at 20:30
  • 2
    I might be missing something, but an error message larger than ssize_t seems completely crazy on any platform where ssize_t is larger than int8_t. – zneak Mar 30 '16 at 20:35
  • @Olaf I worded it that way because of an assumption that C++ just followed whatever the C standard says, or that failing that, it would at most be _less_ permissive rather than _more_ permissive vs. the C standard. Still, I've re-worded that phrase to now include both. Would you say that this question has no meaningful relevance to the C++ world, or that C++ would have a substantially different answer than the answer for C? – mtraceur Mar 30 '16 at 20:40
  • @MarkNunberg Yes. I tried to word the question to account for that - feel free to suggest/edit clarifications to the wording if that's not clear. – mtraceur Mar 30 '16 at 20:41
  • @mtraceur: I really think you should have left it with the C standard. Adding C++ makes the question possibly even broader as now C++ features come into play. If you want a C++ solution, though, just remove the C tag. Either way, a solution can very well depend on the language you use. Notice that identical syntax does not imply identical semantics! – too honest for this site Mar 30 '16 at 20:43
  • @xrgb Yes - I can work around the entire problem at runtime, whether that way or without even clipping them by wrapping writev. But knowing at compile-time would allow preprocessor macros to exclude the unnecessary code, which I think is valuable for several reasons. – mtraceur Mar 30 '16 at 20:51
  • 1
    The C (and I suspect the C++) standard will not help concerning `SSIZE_MAX` as it does not define `SSIZE_MAX`. If `SSIZE_MAX` is important to your code, suggest tagging the environment of interest that defines it. – chux - Reinstate Monica Mar 30 '16 at 21:01
  • 1
    Note, runtime generation *does* happen for out-of-range values (> 134 on Linux), but this generates smaller strings than many in-range values. But including additional metadata is forbidden because there's no saying when or whether the error actually occured. – o11c Mar 30 '16 at 21:06
  • @zneak: Actually, if I remember the standards right, `ssize_t` must always be larger than `int8_t`: `ssize_t` is the signed equivalent of `size_t`, and `size_t` is bound to at a minimum support values up to `65 535`, and the latest POSIX/SUS require `ssize_t` to at least support values up to `32 767`. However, consider that there are multiple maxima going into a typical error message's maximum size: e.g. `ARG_MAX + MAGICAL_ERR_STR_MAX + sizeof(my_biggest_additional_error_text) - 1` together must be smaller than `SSIZE_MAX` for it to be safe to ignore the possibility of overflowing `SSIZE_MAX`. – mtraceur Mar 30 '16 at 21:08
  • @chux: While the `SSIZE_MAX` and `writev` details were an example of my current usecase, the question itself is more broadly relevant: if I were to port my code to Windows' `WriteFileGather` or whatever, the constants to check against change, but knowing the maximum length of an error string remains relevant "would be nice to have" information. – mtraceur Mar 30 '16 at 21:11
  • @mtraceur Certainly `strlen(any_string) <= SIZE_MAX` is true. `strlen(strerror(errno))` is not defined by C to be more constrained. So that the portable limit. – chux - Reinstate Monica Mar 30 '16 at 21:16
  • @chux: Sure, that's a the maximum possible limit that logically follows from the standards' definitions of those functions, yes: But surely the existence of unknown-unknowns justifies asking, in case there was some other, smaller limit. Also note: you can write portable code which takes advantage of smaller non-portable limits if those limits have a portable way to check for their presence (e.g. imagine a `fooUNIX` which defines a macro `ERR_STR_LEN_MAX` and a macro `__FOO_UNIX` that can be checked for with an `#ifdef`.) – mtraceur Mar 30 '16 at 21:51
  • @Olaf: After some serious consideration, I have to disagree in this special case, because the question also has an "at compile time" limitation: I know C++ can do some serious magic with template metaprogramming and the like, but it doesn't seem like (especially given the answers that come in), that there's a difference between the two languages in this regard: the answer more or less seems to come down to "no" or "no (except maybe on some systems there's a preprocessor macro defined that gives an upper bound)", which is equally applicable to either language. – mtraceur Mar 30 '16 at 23:15
  • 1
    @mtraceur I agree this is a good post (hence my UV), it is just that, IMO, there is no possible answer that is highly satisfactory. It is certainty reasonable to survey a number of diverse systems, check their max error message lengths, and use that limit, with x4 for margin. In this special case of error handling, the whole system is in question and a excessively long error message should be viewed with suspicion. IMO, it makes sense to limit to few 100 bytes and print a truncated valid message rather than print a 2G byte message from a de-graded system. – chux - Reinstate Monica Mar 30 '16 at 23:23
  • 1
    @mtraceur: There might be a meta-programming solution: Write a program which – at build-time – gets all strings, calculates the length and passes a macro with the max. length to the module or generates a config-file. With a proper build-system, this can be written directly in the build-script. SCons comes into mind, as that is written in Python and the build-scripts are actually Python programs with full access to the features. Doner properly, this is only run once per clean build, so no real impact on incremental builds. – too honest for this site Mar 31 '16 at 00:10
  • 1
    Based on answers going in different directions, could you clarify something: is your concern that some system might have some errno string whose length is close to `SSIZE_MAX` (incredibly unlikely)? Or are you planning to write a number of bytes that is already close to `SSIZE_MAX`, and you are concerned that a moderately long errno string might push you over? – Nate Eldredge Mar 31 '16 at 05:16
  • 1
    @chux: I see. I suspected that I wouldn't find a satisfactory answer, but a definitive "no" answer is still valuable, I think. As to the error printing, I agree: a system with huge error messages shouldn't really be "trusted". The more I think about it the more I'm starting to fathom just how ridiculous my concern was: honoring an error message which for some reason turns out to be huge may well be the wrong behavior, and I've been unquestioningly thinking it's the "correct" thing for me to account for huge error messages. Truncation may well be the most "right" thing to do in such cases. – mtraceur Mar 31 '16 at 05:52
  • @Olaf: Agreed re: metaprogramming, but to clarify my last comment specifically mentioned C++'s _template_ metaprogramming, in the sense that the only way I could see this developing different answers for the two languages (C vs. C++) was if C++'s templates or other features could be abused to do that search at compile-time, from within the source of the being-compiled program. I think(?) any external program, while an interesting solution, would be outside the scope of the question, since my question explicitly asks if there's a better way than extrernal bruteforce search to generate a header. – mtraceur Mar 31 '16 at 05:58
  • @NateEldredge: The latter was the line of reasoning that initially caused me to ask the question, though I thought the question was more valuable as stated. Still, I'll add it to the "Motivation" section in my question, as it's likely useful contextual information. – mtraceur Mar 31 '16 at 06:01
  • @mtraceur: I did not reply to your comment. It was just an idea which came into mind, unrelated to the preceeding comment. (btw., typo: "Doner" should be "Done"; not related to the tasty turkish food - maybe I was a bit hungry:-) – too honest for this site Apr 01 '16 at 13:40

4 Answers4

16

The C library that you build against may not be the same (ABI compatible C library maybe used) or even exact version of the C library (On GNU/Linux consider glibc 2.2.5 vs. glibc 2.23) that you run against, therefore computing the maximum size of the locale-dependent string returned from strerror can only be done at runtime during process execution. On top of this the locale translations may be updated on the target system at any time, and this again invalidates any pre-computation of this upper bound.

Unfortunately there is no guarantee that the values returned by strerror are constant for the lifetime of the process, and so they may also change at a later time, thus invalidating any early computation of the bound.

I suggest using strerror_r to save the error string and avoid any issues with non-multi-thread aware libraries that might call sterror and possibly change the result of the string as you are copying it. Then instead of translating the string on-the-fly you would use the saved result, and potentially truncate to SSIZE_MAX (never going to happen in reality).

Carlos O'Donell
  • 594
  • 3
  • 10
  • +1, though note that having an upper bound would be great for choosing a size for the buffer passed to `strerror_r` as well, reducing the need for runtime logic. Also, I had considered and was aware of the maxima-could-change-out-from-under-your-program situation - but I was hoping that there was some standard limitation, or at least platform-specific macros we could check on at least some systems, that could've possibly made that a non-issue for at least some systems, hence the question. – mtraceur Mar 30 '16 at 21:40
  • 1
    In particular, it's noteworthy that POSIX (per your link) allows `strerror_r` to fail with an `ERANGE` if the buffer given to it isn't big enough: my reading suggests that an implementation which checks for the length first, and errors out without writing anything, would be standard-compliant. I want to believe no implementation does this, or I'm misreading, but if so, then it seems without knowing an upper bound it seems we must either err on the side of a moderately generous buffer to hold it, or a loop to reallocate until it returns success? – mtraceur Mar 31 '16 at 06:22
  • 1
    There is no guarantee that the `strerrorbuf` was modified by a call to `strerror_r` which returned `ERANGE`. So you must be prepared to handle the various combinations of returns as specified in the normative text of the standard. – Carlos O'Donell Apr 01 '16 at 15:46
  • After a few days of contemplation, I am (tentatively) accepting this answer, because it is both 1) general and 2) provides the most thorough explanations as to why such information could not (and perhaps should not) be available at compile time for many implementations. – mtraceur Apr 01 '16 at 17:35
5

I'm not aware that the C or C++ standards make any assertions regarding the length of these messages. The platforms you're interested in might provide some stronger implementation-defined guarantees, though.

For example, for POSIX systems, I found the following in limits.h.

The following constants shall be defined on all implementations in <limits.h>:

  • […]
  • {NL_TEXTMAX}
    Maximum number of bytes in a message string.
    Minimum Acceptable Value: {_POSIX2_LINE_MAX}

I believe that error messages produced by strerror would fall into this category.

That said, I'm unable to get this macro on my system. However, I do have _POSIX2_LINE_MAX (from <unistd.h>). It is #defined to 2048. Since the standard only says that this is a lower bound, that might not be too helpful, though.

5gon12eder
  • 24,280
  • 5
  • 45
  • 92
  • 1
    I was able to get `NL_TEXTMAX` defined by defining `_GNU_SOURCE`. I therefore suspect that a sufficiently "high" value defined for `_POSIX_C_SOURCE` would get it included. Anyway, the value was `0x7fffffff`, or 2147483647. Not too inspiring on its own - though `SSIZE_MAX` was twice as wide (8 more trailing `f` digits). – mtraceur Mar 31 '16 at 05:06
  • For the record, I +1'ed this as well, and I appreciated the information given with regard to my specific use-case. Even though I currently accepted another answer, I think this is the next-best answer, because it covers information not given in the main answer, and does suggest (albeit inconclusively) that some implementations, or other standards built on top of the language specifications, might in fact define some sort of upper limits. – mtraceur Apr 01 '16 at 17:41
3

The standards make no guarantees about the size limits of the null-terminated string returned by strerror.

In practice, this is never going to be an issue. However, if you're that paranoid about it, I would suggest that you just copy the string returned from strerr and clamp its length to SSIZE_MAX before passing it to writev.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Colin Basnett
  • 4,052
  • 2
  • 30
  • 49
  • 3
    I'm inclined to agree that the standard makes no guarantee in this area, but I don't think cppreference.com is an authoritative reference for that. – John Bollinger Mar 30 '16 at 20:43
2

It is safe to assume that SSIZE_MAX will be greater than the longest string (character array) that strerror returns in a normal C or C++ system. This is because usable system memory (usable directly by your C program) can be no larger than SIZE_MAX (an unsigned integer value) and SSIZE_MAX will have at least the same number of bits so using 2's compliment math to account for the signed nature of SSIZE_MAX (and ssize_t) SSIZE_MAX will be at least 1/2 the size of system memory.

nategoose
  • 12,054
  • 27
  • 42
  • 2
    Detail: "This is because usable system memory (usable directly by your C program) can be no larger than SIZE_MAX" --> disagree, `SIZE_MAX` is the max value of an array index for a _single_ array, not the maximum amount of memory. Concerning `SSIZE_MAX`: that is not defined in C, so best to cite your reference. – chux - Reinstate Monica Mar 30 '16 at 20:48
  • `SIZE_MAX` is also the maximum size of the portion of a C2011 `struct` up to but not including the last member, because it is the type of the result of the standard `offsetof()` macro. – John Bollinger Mar 30 '16 at 20:53
  • @chux: standard C has a flat (von Neumann) memory model, and all of that memory may be treated as a single array (`((char *)0)[x]`). C compilers for Harvard (like most AVR and 8051 MCUs) , segmented (like 80286), or other systems often use non-standard language extensions to achieve access to the different areas of memory or introduce restrictions to language features to avoid some memory model issues. the answer of http://stackoverflow.com/questions/8649018/what-is-the-difference-between-ssize-t-and-ptrdiff-t disagrees, but writev on with those imitations would be severely limited. – nategoose Mar 31 '16 at 19:41
  • Post C stand references that support that. By my reading there are none, so disagree with the unsupported "standard C has a flat (von Neumann) memory model, and all of that memory may be treated as a single array (((char *)0)[x])" and _only_ that model. – chux - Reinstate Monica Mar 31 '16 at 19:46