3

I am writing code that needs to format a string, and I want to avoid buffer overruns.

I know that if vsnprintf is available (C99 onwards) we can do:

char* formatString(const char *format, ...)
{
    char* result = NULL;
    va_list ap;
    va_start(ap, format);

    /* Get the size of the formatted string by getting vsnprintf return the
     * number of remaining characters if we ask it to write 0 characters */
    int size = vsnprintf(NULL, 0, format, ap);

    if (size > 0)
    {
        /* String formatted just fine */
        result = (char *) calloc(size + 1, sizeof(char));
        vsnprintf(result, size + 1, format, ap);
    }

    va_end(ap);
    return result;
}

I can't figure out a way of doing something similar in C90 (without vsnprintf). If it turns out to not be possible without writing extremely complex logic I'd be happy to set a maximum length for the result, but I'm not sure how that could be achieved either without risking a buffer overrun.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
user2891462
  • 3,033
  • 2
  • 32
  • 60
  • 4
    If porting for newer C version is not an option, I would look for open source alternatives for `vsnprintf`. Simpler `vsprintf` is simply too broken to be usable. – user694733 Sep 27 '18 at 12:44
  • 2
    The main reason `vsnprintf()` was added to C99 was that it is hard to protect `vsprintf()` or similar. One workaround is to open `/dev/null`, use `vfprintf()` to format the data to it, note how big a result was needed, and then decide whether it is safe to proceed. Icky, especially if you open the device on each call. – Jonathan Leffler Sep 27 '18 at 14:09
  • @JonathanLeffler Thanks for the tip, feel free to post it as an answer (with the caveats) :) – user2891462 Sep 28 '18 at 06:23

2 Answers2

2

Pre-C99 affords no simply solution to format strings with a high degree of safety of preventing buffer overruns.

It is those pesky "%s", "%[]", "%f" format specifiers that require so much careful consideration with their potential long output. Thus the need for such a function. @Jonathan Leffler

To do so with those early compilers obliges code to analyze format and the arguments to find the required size. At that point, code is nearly there to making you own complete my_vsnprintf(). I'd seek existing solutions for that. @user694733.


Even with C99, there are environmental limits for *printf().

The number of characters that can be produced by any single conversion shall be at least 4095. C11dr §7.21.6.1 15

So any code that tries to char buf[10000]; snprintf(buf, sizeof buf, "%s", long_string); risks problems even with a sufficient buf[] yet with strlen(long_string) > 4095.

This implies that a quick and dirty code could count the % and the format length and make the reasonable assumption that the size needed does not exceed:

size_t sz = 4095*percent_count + strlen(format) + 1;

Of course further analysis of the specifiers could lead to a more conservative sz. Continuing down this path we end at writing our own my_vsnprintf().


Even with your own my_vsnprintf() the safety is only so good. There is no run-time check that the format (which may be dynamic) matches the following arguments. To do so requires a new approach.

Cheeky self advertisement for a C99 solution to insure matching specifiers and arguments: Formatted print without the need to specify type matching specifiers using _Generic.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

Transferring comments to answer.

The main reason vsnprintf() was added to C99 was that it is hard to protect vsprintf() or similar. One workaround is to open /dev/null, use vfprintf() to format the data to it, note how big a result was needed, and then decide whether it is safe to proceed. Icky, especially if you open the device on each call.

That means your code might become:

#include <assert.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>

extern char *formatString(const char *format, ...);

char *formatString(const char *format, ...)
{
    static FILE *fp_null = NULL;

    if (fp_null == NULL)
    {
        fp_null = fopen("/dev/null", "w");
        if (fp_null == NULL)
            return NULL;
    }

    va_list ap;

    va_start(ap, format);
    int size = vfprintf(fp_null, format, ap);
    va_end(ap);

    if (size < 0)
        return NULL;

    char *result = (char *) malloc(size + 1);
    if (result == NULL)
        return NULL;

    va_start(ap, format);
    int check = vsprintf(result, format, ap);
    va_end(ap);

    assert(check == size);

    return result;
}

int main(void)
{
    char *r1 = formatString("%d Dancing Pigs = %4.2f%% of annual GDP (grandiose dancing pigs!)\n",
                            34241562, 21.2963);
    char *r2 = formatString("%s [%-13.10s] %s is %d%% %s\n", "Peripheral",
                            "sub-atomic hyperdrive", "status", 99, "of normality");

    if (r1 != NULL)
        printf("r1 = %s", r1);

    if (r2 != NULL)
        printf("r2 = %s", r2);

    free(r1);
    free(r2);
    return 0;
}

As written with fp_null a static variable inside the function, the file stream cannot be closed. If that's a bother, make it a variable inside the file and provide a function to if (fp_null != NULL) { fclose(fp_null); fp_null = NULL; }.

I'm unapologetically assuming a Unix-like environment with /dev/null; you can translate that to NUL: if you're working on Windows.

Note that the original code in the question did not use va_start() and va_end() twice (unlike this code); that would lead to disaster. In my opinion, it is a good idea to put the va_end() as soon after the va_start() as possible — as shown in this code. Clearly, if your function is itself stepping through the va_list, then there will be a bigger gap than shown here, but when you're simply relaying the variable arguments to another function as here, there should be just the one line in between.

The code compiles cleanly on a Mac running macOS 10.14 Mojave using GCC 8.2.0 (compiled on macOS 10.13 High Sierra) with the command line:

$ gcc -O3 -g -std=c90 -Wall -Wextra -Werror -Wmissing-prototypes \
>     -Wstrict-prototypes vsnp37.c -o vsnp37
$

When run, it produces:

r1 = 34241562 Dancing Pigs = 21.30% of annual GDP (grandiose dancing pigs!)
r2 = Peripheral [sub-atomic   ] status is 99% of normality
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Such code would not be appropriate in security-sensitive contexts, because a malevolent caller could pass a string whose content will be changed (and made longer) by code in another thread. I think the best way to handle such functions is to write a generalized vxprintf function which accepts a double-indirect pointer to a function which will passed that pointer and any data to be output. The caller of the generalized function could create a struct whose first member was a pointer to the callback, and which would use the rest of the structure as needed. Such a function could... – supercat Oct 05 '18 at 19:24
  • ...then be used as the "core" for all other kinds of printf defined by the Standard, as well as many useful types that aren't (e.g. a 'graphics printf' to render text on a graphics screen). I suspect many implementations actually chain all their printf functions to a common routine, but the exact means of doing so vary. – supercat Oct 05 '18 at 19:26
  • I'm not sure how you think `vsnprintf()` would be protected from the malicious thread calling scenario you outline. – Jonathan Leffler Oct 05 '18 at 20:05
  • 2
    Given `vsnprintf(dest, n, "(%s)", evilstring);`, a good implementation of `vsnprintf` should limit the output to n-1 bytes followed by a terminator regardless of what `evilstring` does. If code uses `vsnprintf` once to find the length, allocates a buffer, and then uses `vsnprintf` again to do the outputting, and `evilString` changes from `Sup` to `Supercalifragilisticexpialidocious" between the two calls, the second call should store `(Supe` and a zero byte, and return 36 without overwriting anything past the end of the buffer. – supercat Oct 05 '18 at 20:15