3

During an interview I was requested (among other things) to implement the following function:

int StrPrintF(char **psz, const char *szFmt, ...);

similar to sprintf, except instead of the already-allocated storage the function must allocate it itself, and return in the *psz variable. Moreover, *psz may point to an already-allocated string (on the heap), which may potentially be used during the formatting. Naturally this string must be free by the appropriate means.

The return value should be the length of the newly created string, or negative on error.

This is my implementation:

int StrPrintF(char **psz, const char *szFmt, ...)
{
    va_list args;
    int nLen;

    va_start(args, szFmt);

    if ((nLen = vsnprintf(NULL, 0, szFmt, args)) >= 0)
    {
        char *szRes = (char*) malloc(nLen + 1);
        if (szRes)
            if (vsnprintf(szRes, nLen + 1, szFmt, args) == nLen)
            {
                free(*psz);
                *psz = szRes;
            }
            else
            {
                free(szRes);
                nLen = -1;
            }
        else
            nLen = -1;
    }

    va_end(args);
    return nLen;
}

The question author claims there's a bug in this implementation. Not just a standard violation that may fail on particular esoteric systems, but a "real" bug, which by chance may fail on most systems.

It's also not related to usage of int instead of memory-capability-suited type, such as size_t or ptrdiff_t. Say, the strings are of "reasonable" size.

I really have no clue of what the bug could be. All the pointer arithmetic is ok IMHO. I even don't assume that two consequent invocations of vsnprintf produce the same result. All the variadic-handling stuff is also correct IMHO. va_copy is not needed (it's the responsibility of the callee that uses va_list). Also on x86 va_copy and va_end are meaningless.

I'll appreciate if someone can spot the (potential) bug.

EDIT:

After checking out the answers and comments - I'd like to add some notes:

  • Naturally I've built and run the code with various inputs, including step-by-step in debugger, watching the variables state. I'd never ask for help without trying things myself first. I saw no sings of problems, no stack/heap corruption, etc. Also I've run it in debug build, with the debug heap enabled (which is intolerant to heap corruption).
  • I assume that the function is called with valid parameters, i.e. psz is a valid pointer (not to confuse with *psz), szFmt is a valid format specifier, and all the variadic parameters are evaluated and correspond to the format string.
  • Calling free with NULL pointer is ok according to the standard.
  • Calling vsnprintf is ok with NULL pointer and size=0. It should return the resulting string length. MS-version, though not fully standard-compliant, does the same in this specific case.
  • vsnprintf won't exceed the specified buffer size, including the 0-terminator. Means - it does not always places it.
  • Please put the coding style aside (if you don't like it - fine with me).
valdo
  • 12,632
  • 2
  • 37
  • 67
  • Did you try compiling and running some test code on your machine? – David Grayson Apr 09 '12 at 06:16
  • There's a problem with `snprintf()`, but I'm not sure that it applies to `vsnprintf()`... – Ignacio Vazquez-Abrams Apr 09 '12 at 06:17
  • I believe that the docs for `vsnprintf` state that the output is *always* null-terminated, so your call `vsnprintf(NULL, ...)` may cause a seg fault. – inspector-g Apr 09 '12 at 06:24
  • First some remarks on style. Why do you put the assignment in the `if` statement? Why cast the return of `malloc`? These are bad style, not too convincing for an interview. And if you'd separate your first call to `vsnprintf` out of its (useless) context and give it the correct context that a `v...` call needs you'd also see your bug. (Which is there and real.) – Jens Gustedt Apr 09 '12 at 06:41
  • @David Grayson: of course. Of course I compiled & run this on my machine, with several variants of input, step-by-step in debugger and etc., and saw no sings of problems. I try to do things myself first, before asking for help – valdo Apr 09 '12 at 06:55
  • @inspector g: Docs of `vsnprintf` state that it's **not always** 0-terminated. RTM – valdo Apr 09 '12 at 06:55
  • @Jens Gustedt: Let's put the styling aside (though it's ok in my taste), can you please explain what do you mean by "giving the correct context to a `v...` needs"? – valdo Apr 09 '12 at 07:00
  • @valdo `man vsnprintf` on my system (Mac OS 10.6) states "The output is always null-terminated." If you're on another system then this would explain the confusion! – inspector-g Apr 09 '12 at 07:16
  • 1
    @inspector g: For curiosity I've checked the MAC docs, here http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man3/printf.3.html , "... The snprintf() and vsnprintf() functions will write **at most n-1 of the characters** printed into the out-put output put string (the n'th character then gets the terminating `\0'); if the return value is greater than or equal to the n argument, the string was too short and some of the printed characters were discarded. **The output is always null-terminated**...". – valdo Apr 09 '12 at 07:31
  • @inspector g: (so that there's no problem) – valdo Apr 09 '12 at 07:31
  • @hellork: the general context of printf-like usage into a heap-allocated buffer implies *psz must be either NULL or a pointer to heap, never a string literal. – Tony Delroy Apr 09 '12 at 07:38
  • nevermind, I just re-read the description – hellork Apr 09 '12 at 07:39
  • One thing I notice, it does not return -1 if malloc fails. Oops, yes it does. Hmm. I copied it wrong. Still looking... – hellork Apr 09 '12 at 07:54
  • @hellork: thanks for still looking :) I tend to think the question author is either wrong or has an unusual sense of humor... – valdo Apr 09 '12 at 08:06
  • @valdo, a call to the `v...` type of functions needs a `va_start` before and a `va_end` after, since it "consumes" the `va_arg`. But @cnicutar in the mean time has killed the suspence, so you now know that you'd have to use a second copy of your `va_arg`. – Jens Gustedt Apr 09 '12 at 08:19
  • @valdo Thanks for checking up on the docs. That's what I get for using `grep` :) – inspector-g Apr 09 '12 at 15:50

5 Answers5

9

va_copy is not needed (it's the responsibility of the callee that uses va_list)

Not quite right. I didn't find any such requirement for vsnprintf in the C11 standard. It does say this in a footnote:

As the functions vfprintf, vfscanf, vprintf, vscanf, vsnprintf, vsprintf, and vsscanf invoke the va_arg macro, the value of arg after the return is indeterminate.

When you call vsnprintf, the va_list can be passed by value or by reference (it's an opaque type for all we know). So the first vsnprintf can actually modify va_list and ruin things for the second. The recommended approach is to make a copy using va_copy.

And indeed, according to this article it doesn't happen that way on x86 but it does on x64.

cnicutar
  • 178,505
  • 25
  • 365
  • 392
  • Affirmative. Linux man pages 3.32 says something to the effect that the value of ap is _undefined_ after the call. – hellork Apr 09 '12 at 08:49
  • A good point, thanks. Actually I don't see the rationale behind passing `va_list` "by reference" (there're obviously no references on C) or whatever indirect way. MSVC uses the same strategy on AMD64 as on x86 (checked right now). But, talking purely in term of standard, this is a bug indeed. I was aware of the existence of `va_copy`, but I was pretty sure that it's the responsibility of the callee to preserve the state of `va_list`. – valdo Apr 09 '12 at 08:51
1

The first call to vsnprintf() is really an attempt to get the length of the final string. However, it has a side effect! It moves the variable argument to the next one in the list as well. So, the next call to vsnprintf() does not have the first argument in the list captured. The easy hack is to reset the variable argument list to start again once you get the length from the first vsnprintf(). Maybe there's another way to do this better but, yeah, that's the issue.

Joe James
  • 11
  • 1
0

The first argument of vsnprintf should not be null according to:

http://msdn.microsoft.com/en-us/library/1kt27hek(v=vs.80).aspx

Edit 1: You should not free *psz if it is null!

David Grayson
  • 84,103
  • 24
  • 152
  • 189
  • @David Grayson: (1) See at the end of the microsoft manual, starting from VS2005 SP1 this is changed such that one is **allowed** to put `NULL` iff the buffer size=0. (2) According to the standard `free` **may** be called with `NULL` pointer – valdo Apr 09 '12 at 06:52
  • @valdo: That's just a user comment, not part of the manual. It might be true but I can't be sure if it's true without trying it. The docs for VS11 still say that the buffer and format args should be non-null: http://msdn.microsoft.com/en-us/library/1kt27hek(v=vs.110).aspx – David Grayson Apr 09 '12 at 18:34
0

Moreover, *psz may point to an already-allocated string (on the heap), which may potentially be used during the formatting.

For *psz to be potentially reusable, some indication of whether it's garbage or a valid heap pointer is needed. Given no function argument indicating that, you can assume the only sane convention of a NULL sentinel value.... i.e. if *psz is not NULL, then you can reuse it provided that the data you wish to format can fit into the same space. As the function is not given any indication of the amount of memory previous allocated, you can either: - use realloc and trust it to avoid needless movement of the buffer - infer a minimum pre-existing buffer size from strlen() - this would mean that if you're say writing a long string then a short string then the original long string into the buffer, the last operation will needlessly replace the buffer.

Clearly realloc is a better bet.

int StrPrintF(char **psz, const char *szFmt, ...)
{
     va_list args;
     int nLen;
     va_start(args, szFmt);
     if ((nLen = vsnprintf(NULL, 0, szFmt, args)) >= 0)
     {
         char *szRes = (char*) realloc(psz, nLen + 1);
                             // ^ realloc does a fresh allocation is *psz == NULL
         if (szRes)
             vsnprintf(*psz = szRes, nLen + 1, szFmt, args); // can't fail
                       // ^ note the assignment....
         else
             nLen = -1;
     }
     va_end(args);
     return nLen;
} 

Note too - from a Linux manpage for printf() - if your sprintf() doesn't return a useful length you've got to get/write an implementation that does....

Concerning the return value of snprintf(), SUSv2 and C99 contradict each other: when snprintf() is called with size=0 then SUSv2 stipulates an unspecified return value less than 1, while C99 allows str to be NULL in this case, and gives the return value (as always) as the number of characters that would have been written in case the output string has been large enough.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • Thanks for your answer. Yes, the sane convention is that `*psz` is `NULL` iff it does not hold an already-allocated string. Your idea with `realloc` failes into typical gotcha one would expect in such a question. Note that you reallocate the `*psz` **before** the actual resulting string is formatted, i.e. before the 2nd invocation of `vsnprintf`. But what if one of the function parameters is the current value of `*psz`? I mean, one may write `StrPrintF(&s, "hello %s, s);` Doing `realloc` may free the `*psz` preliminary! – valdo Apr 09 '12 at 07:27
  • @valdo: you can't expect `sprintf(buffer, "hello %s", buffer)` to work - I'd say it's reasonable for an implementation to have similar preconditions / usage constraints as `sprintf()`.... (Explanation: say buffer has "x\0", sprintf() starts copies "hello " over "x\0" and the following characters at which point buffer is not guaranteed NUL terminated, so the `%s` copy may run off indefinitely) – Tony Delroy Apr 09 '12 at 07:34
  • But our function has another semantics, and your assumptions of what to expect from it are irrelevant. I didn't provide the full interview question (it consisted of several parts), but one can conclude from it that this function is expected to work correctly in such a case, i.e. doing `StrPrintF(&s, "hello %s, s);` should be supported. – valdo Apr 09 '12 at 07:43
  • Anyway, do you see a problem with the **existing** implementation? – valdo Apr 09 '12 at 07:44
  • "moreover, *psz may point to an already-allocated string (on the heap), which may potentially be used during the formatting" <- I was understanding this to mean that the buffer may be re-used, but can see it could also describe a special requirement for the `f(buffer, "x %s", buffer)` usage discussed... it's ambiguous - my answer assumes you must support reuse which your existing implementation doesn't. – Tony Delroy Apr 09 '12 at 07:52
-1

Without giving you the answer outright: check your inputs.

geekosaur
  • 59,309
  • 11
  • 123
  • 114