18

I have to use IAR compiller in embedded application (it does not have namespaces, exceptions, multiple/virtual inheritance, templates are bit limited and only C++03 is supported). I can't use parameter pack so I tried to create member function with variadic parameter. I know variadic parameters are generally unsafe. But is safe to use this pointer in va_start macro?

If I use ordinary variadic function it would need a dummy parameter before ... to be able to access remaining parameters. I know variadic macro would not need parameter before ... but I would prefer not to use it. If I use member function it has hidden this parameter before ... so I tried it.:

struct VariadicTestBase{
  virtual void DO(...)=0;
};

struct VariadicTest: public VariadicTestBase{
  virtual void DO(...){
    va_list args;
    va_start(args, this);
    vprintf("%d%d%d\n",args);
    va_end(args);
  }
};

//Now I can do
VariadicTestBase *tst = new VariadicTest;
tst->DO(1,2,3);

tst->DO(1,2,3); prints 123 as expected. But I am not sure if it is not just some random/undefined behavior. I know tst->DO(1,2); would crash just like normal prinf would. I do not mind it.

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
user11373693
  • 183
  • 7
  • 3
    The standard requires the argument to `va_start` to be the rightmost parameter in the parameter list ([\[cstdarg.syn\]](https://eel.is/c++draft/cstdarg.syn#1.sentence-2)). Since `this` is not in the parameter list, I'd say it's undefined behavior. – cpplearner Apr 17 '19 at 14:33
  • From a conceptual standpoint this is strange, since how is the callee supposed to know the number of arguments? The only sane place it can get it from is via `this`, but that means the number of arguments passed isn't a property of the invocation anymore, but of the object itself. That's pretty weird and it's not clear to me that it's a good design... and I'm not sure I've seen that in any language before. – user541686 Apr 17 '19 at 16:15
  • @Mehrdad: Consider a function that accepts an arbitrary number of strings and output their total length followed by their content. If one could get a `va_list` that started with the first string, one could use `va_copy` on that, use one loop that reads all the arguments from the original va_list and adds up the lengths, and then a loop that uses the copied va_list to output all the strings, without having to "special-case" the first string. – supercat Apr 17 '19 at 16:32
  • 1
    @supercat: But in that case you should be making the first string explicit. Varargs would imply that it could be absent, which is clearly not the case, unlike with the arguments that follow, so it really is a special case. And practically speaking it helps the compiler raise an error if you forget to specify it. – user541686 Apr 17 '19 at 16:36
  • @Mehrdad: Using a separate explicit argument for the first string pointer require that code make an extra copy of that pointer as well as the `va_list`. Further, any functions that are passed a va_list would need to be passed a first-string pointer in addition to the va_list. Not a terribly huge problem, but the code could be cleaner and more efficient if that weren't required. – supercat Apr 17 '19 at 16:46
  • @supercat: Yes, that's what `sprintf` et al. already do... – user541686 Apr 17 '19 at 16:49
  • @Mehrdad: As another couple considerations: (1) one could legitimately pass zero non-sentinel values, and forgetting the sentinel in the zero-other-items case seems so much less likely than forgetting it in other cases that the validation would be of little value; (2) in some cases where a function receives pointers to two methods, but will use one *only* for purposes of passing it to the second, the code through which the function pointer is passed shouldn't need to know or care about any of its argument types. – supercat Apr 17 '19 at 16:51
  • @Mehrdad: The first two arguments of `sprintf` are processed totally differently from any that follow, so I'm not sure what your point is. – supercat Apr 17 '19 at 16:52
  • @supercat: My point is that they still need to be passed around separately with `va_list` like you said, and that doesn't exactly pose a problem. – user541686 Apr 17 '19 at 16:57
  • @Mehrdad: If the first arguments to `sprintf` weren't processed separately, it would need to fetch them from the `va_list` using different code from everything else. The natural way to write an "output items until sentinel" function, however, would be to process the first item in the same loop as every other. – supercat Apr 17 '19 at 17:08
  • @supercat: Yes that's just `char const *fmt = va_arg(ap, char const *);`. Hardly something anyone loses sleep over. And for the loop it's also not exactly difficult to make it do one extra iteration for the first element and fetch that when `i == 0` or something. I have to say it would be nice if you didn't keep picking pointless arguments with me on StackOverflow though... – user541686 Apr 17 '19 at 17:13
  • @Mehrdad: You asked a question--how one could plausibly use the ability to have all arguments be variadic. I sought to describe a way that could be useful. Not indispensable, perhaps, but somewhat useful. BTW, can you name any languages other than C or C++ which support variadic arguments but require at least one fixed argument? C#, VB.Net, Javascript, and I think Java all allow one to define a function which can usefully accept an arbitrary number of parameters including zero. So far as I can tell, C and C++ are unique in *not* doing so. – supercat Apr 17 '19 at 17:44
  • @supercat: Those languages all pass the length in the background. C and C++ don't impose such overhead for you. And in any case, in none of the cases is the argument length a property of the object whose method is being called. Can you stop arguing please? – user541686 Apr 17 '19 at 17:50
  • I assume you thought of overloading the member function (which could get tricky since it’s `virtual`) or passing a data structure of variable size as the parameter. Is there a reason those aren’t good options? – Davislor Apr 17 '19 at 19:39

4 Answers4

20

Nothing specifies that behaviour in the standard, so this construct just invokes formal Undefined Behaviour. That means that it can work fine in your implementation and cause compilation error or unexpected results in a different implementation.

The fact that non static methods have to pass the hidden this pointer cannot guarantee that va_start can use it. It probably works that way because in the early times, C++ compilers were just pre-processors that converted C++ source to C source and the hidden this parameter was added by the pre-processor to be available to the C compiler. And it has probably be maintained for compatibility reasons. But I would try hard to avoid that in mission critical code...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 2
    If it's undefined, then I'd go even further and say that not only can it cause different results on different implementations, but also with different runs on the same implementation. – cadaniluk Apr 17 '19 at 14:19
  • 2
    @cadaniluk Real world implementations are much more consistent than what is required by standard, whether their *extensions* are documented or not. Simply relying on that gives non portable code. – Serge Ballesta Apr 17 '19 at 14:26
  • 2
    @SergeBallesta: Some real-world compiler-writers try to be more consistent than what is required by the Standard. Others are more prone to view consistent behaviors as "missed optimization opportunities" regardless of whether the consistent behaviors might have been useful, or the "optimizations" are likely to offer any real benefit. – supercat Apr 17 '19 at 16:34
  • @SergeBallesta More consistent than is required, yes, but that does not mean that they are consistent enough to be relied upon. Even if an implementation always compiles the UB-containing code the same way (which is not guaranteed, but isn't unusual for some types of UB), and even if it compiles it the same way across all different levels of optimization (which is less likely), there's still no guarantee that the next release of the same compiler won't change a few things that result in different behavior in certain undefined cases. And trying to get such code to work again is a colossal pain. – Ray Apr 18 '19 at 01:12
  • @supercat: That's why I had said that it was UB and would give non portable code. My remark was just about *different runs on the same implementation*. It is certainly allowed per standard for UB. But in this specific use case, I still think that the risk is low. Just my opinion of course, only enforced by programming for forty years. – Serge Ballesta Apr 18 '19 at 07:52
  • @SergeBallesta: If one is using a compiler written by someone who values compatibility with earlier compilers, that is true. If one is using a compiler whose authors think decades of useful behavior is just "happenstance" resulting from "missed optimizations", however, then code which relies upon anything not mandated by the Standard--even constructs whose behavior the authors of the Standard thought were sufficiently obvious to not need mention--will be prone to fail. – supercat Apr 18 '19 at 13:32
3

Seems to be undefined behavior. If you look at what va_start(ap, pN) does in many implementations (check your header file), it takes the address of pN, increments the pointer by the size of pN and stores the result in ap. Can we legally look at &this?

I found a nice reference here: https://stackoverflow.com/a/9115110/10316011

Quoting the 2003 C++ standard:

5.1 [expr.prim] The keyword this names a pointer to the object for which a nonstatic member function (9.3.2) is invoked. ... The type of the expression is a pointer to the function’s class (9.3.2), ... The expression is an rvalue.

5.3.1 [expr.unary.op] The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified_id.

So even if this works for you, it is not guaranteed to and you should not rely on it.

  • You say that `va_start(ap, pN)` takes the address of `pN`. Are you sure that's what it does? What if `pN` is a parameter passed in a register, and therefore has no address? – John Zwinck Apr 17 '19 at 14:58
  • @JohnZwinck putting an ellipsis in the arguments should force the function/method into cdecl calling convention –  Apr 17 '19 at 14:59
  • @JohnZwinck ok, I see that gcc has va_* as a bunch of builtins that are free to do even crazy stuff like traverse multiple registers; not sure how "IAR compiler" does `va_start` –  Apr 17 '19 at 15:24
  • 2
    IAR defines: `va_start(ap, A) (ap._Ap = _CSTD __va_start1())`. So it ignores 2nd parameter. `va_start(args, 0);` works too. But it is ugly and not portable. – user11373693 Apr 17 '19 at 16:46
  • 1
    @user11373693 I think I remember borland c++ 3.1 ignoring the second arg as well. So you have the option of putting `va_start(ap, if this breaks for you - fix it)`, if someone uses a different compiler, at least they will find out :) –  Apr 17 '19 at 16:52
  • As I understand it historically varags were a hack that relied on pointer arithmetic with the last parameter. However I suspect few if any modern compilers do it that way, it's simply not compatible with modern ABIs. – plugwash Apr 17 '19 at 18:37
  • 1
    @jakub_d On intel + linux the ellipsis still passes first arguments by registers. As evidence, play with `printf("%d %lf", 1.2, 10)` and see [how it prints 10, 1.2](https://wandbox.org/permlink/oFlwfimqZX4BiP7W), since it looks at the registers not the stack. – Michael Veksler Apr 17 '19 at 19:08
2

I think it should be OK, though I doubt you will find a specific citation from the C++ standard which says so.

The rationale is this: va_start() must be passed the last argument to the function. A member function taking no explicit parameters has only a single parameter (this), which therefore must be its last parameter.

It will be easy to add a unit test to alert you if you ever compile on a platform where this doesn't work (which seems unlikely, but then again you are already compiling on a somewhat atypical platform).

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 4
    Is it possible for an ABI to assign a specific register for `this` which is never used in regular functions (hence breaking va_start)? Is it possible for `this` to be subject to special optimizations which break va_start? – Michael Veksler Apr 17 '19 at 13:48
  • @MichaelVeksler: I don't think the "special optimizations" part should be relevant, but I did consider the ABI aspect. I decided it is not very reasonable to expect ABI-related breakage, because ABIs already treat different parameters differently e.g. based on their type (int goes here, float goes there). It's upon the implementation to understand which types of parameters go where, and if `this` is passed in a specific register that's not really different to values of type `double` being passed in a specific register, is it? – John Zwinck Apr 17 '19 at 13:56
  • I still feel uncomfortable about this. Generally when something is not allowed explicitly in C++, it is undefined behavior. The way I understand it, `this` is not defined as a parameter in the standard, which is not compatible with the language of va_start – Michael Veksler Apr 17 '19 at 14:03
  • @MichaelVeksler: If `this` is not a parameter, how does `std::bind(&C::f, c)` work? – John Zwinck Apr 17 '19 at 14:06
  • 2
    you are talking about implementation details, not the language of the standard. The language does not say that`this` is a parameter. At least not in a way compatible with va_start language. Of course it's implemented as a parameter, but without explicit language guarantee the compiler might do something bad with it – Michael Veksler Apr 17 '19 at 14:12
  • I just figured out when it might break - singletons. The compiler might figure out that an object is a singleton, and avoid passing `this` as a parameter. In such a case, `this` becomes a global entity whose address is set at link or load time. – Michael Veksler Apr 17 '19 at 14:22
  • 1
    @JohnZwinck The behavior of `bind` is specified to use the `INVOKE` operation described [here](https://en.cppreference.com/w/cpp/utility/functional/invoke), which has special handling for member functions. This is just an abstraction layer, so it doesn't mean the ABI is compatible. – interjay Apr 17 '19 at 14:27
  • @MichaelVeksler: Can you describe or link to example code showing the specific type of singleton you're talking about? Singleton is a pattern which may be implemented in several ways, and I'm not sure what you're thinking of. – John Zwinck Apr 17 '19 at 14:31
  • 1
    _"It will be easy to add a unit test"_, how, in any reasonably reliable way? – Passer By Apr 17 '19 at 14:43
  • @JohnZwinck look at my answer below – Michael Veksler Apr 17 '19 at 14:46
  • 1
    @PasserBy: Here are some ideas: (1) Call the function, check the results (via passing the args to snprintf(), vsnprintf(), etc). (2) Do the same thing but running inside UBSan, valgrind, etc. (3) Compile but do not assemble, check in the generated assembly code for a simple example of both callee and caller, and diff (in case your platforms ever change). – John Zwinck Apr 17 '19 at 14:48
  • @MichaelVeksler: At which point you get a compiler error as the compiler tries to take the address of the register. It might be an intelligent one, or it might just say internal compiler error. Now if it said "can't take the address of a variable declared register" it would be really nice. – Joshua Apr 18 '19 at 03:41
  • @Joshua from language point of view registers don't exist - they are only a hidden implementation detail. If you try to take the address of a parameter passed by register, the compiler will store the value on stack, and refer to the address of that. – Michael Veksler Apr 18 '19 at 04:04
0

This is undefined behavior. Since the language does not require this to be passed as a parameter, it might not be passed at all.

For example, if a compiler can figure out that an object is a singleton, it may avoid passing this as a parameter and use a global symbol when the address of this is explicitly required (like in the case of va_start). In theory, the compiler might generate code to compensate that in va_start (after all, the compiler knows this is a singleton), but it is not required to do so by the standard.

Think of something like:

class single {
public:
   single(const single& )= delete;
   single &operator=(const single& )= delete;
   static single & get() {
       // this is the only place that can construct the object.
       // this address is know not later than load time:
       static single x;
       return x;
   }
  void print(...) {
      va_list args;
      va_start (args, this);
      vprintf ("%d\n", args);
      va_end (args);
}

private:
  single() = default;
};

Some compilers, like clang 8.0.0, emit a warning for the above code:

prog.cc:15:23: warning: second argument to 'va_start' is not the last named parameter [-Wvarargs] va_start (args, this);

Despite the warning, it runs ok. In general, this proves nothing, but it is a bad idea to have a warning .

Note: I don't know of any compiler that detects singletons and treats them specially, but the language does not forbid this kind of optimization. If it is not done today by your compiler, it might be done tomorrow by another compiler.

Note 2: despite all that, it might work in practice to pass this to va_start. Even if it works, it is not a good idea to do something that isn't guaranteed by the standard.

Note 3: The same singleton optimization can't be applied to parameters such as:

void foo(singleton * x, ...)

It can't be optimized away since it may have one of two values, point to the singleton or be nullptr. This means that this optimization concern does not apply here.

Michael Veksler
  • 8,217
  • 1
  • 20
  • 33
  • Isn't the hypothetical optimization you've presented also possible if the parameter is named? E.g. a `single*` which is ever dereferenced can be shown to come from `static single x` in `get()`, therefore it does not need to be passed to a regular free function `void f(single* s, ...) { *s; va_list vl; va_start(vl, s); va_end(vl); }`. Do you think this function `f` has undefined behavior too? – John Zwinck Apr 17 '19 at 15:02
  • @JohnZwinck no, it doesn't, since that would break va_start for an explicit parameter. The compiler is not allowed to perform optimizations which break legal code – Michael Veksler Apr 17 '19 at 15:02
  • 2
    You've basically made a circular argument now. – John Zwinck Apr 17 '19 at 15:04
  • You can't optimize away regular `single*` parameter, since it could either point to the singleton or be null. On the other hand `this` must point to the singleton, always. – Michael Veksler Apr 17 '19 at 18:27
  • You can optimize away a regular `single*` if it is dereferenced (because that means if it is null you have UB, so it didn't matter what you did). That's why I wrote the unused `*s`. – John Zwinck Apr 18 '19 at 04:54
  • @JohnZwinck I agree. There are cases when the compiler can optimize away a parameter. It does that regularly when inlining a function. Yet, va_start works, since inlining is disabled (or made to appear so) when va_start is present. Optimizing away `this` can also be disabled with variadic functions, but the compiler is not required to do so for undefined behavior (and va_start for `this` qualifies) – Michael Veksler Apr 18 '19 at 05:22