22

Oftentimes data structures' valid initialization is to set all members to zero. Even when programming in C++, one may need to interface with an external API for which this is the case.

Is there any practical difference between:

some_struct s;
memset(&s, 0, sizeof(s));

and simply

some_struct s = { 0 };

Do folks find themselves using both, with a method for choosing which is more appropriate for a given application? (Hopefully it is understood that this is only currently applicable to POD structures; you'd get all sorts of havoc if there was a C++ std::string in that structure.)

For myself, as mostly a C++ programmer who doesn't use memset much, I'm never certain of the function signature so I find the second example is just easier to use in addition to being less typing, more compact, and maybe even more obvious since it says "this object is initialized to zero" right in the declaration rather than waiting for the next line of code and seeing, "oh, this object is zero initialized."

When creating classes and structs in C++ I tend to use initialization lists; I'm curious about folks thoughts on the two "C style" initializations above rather than a comparison against what is available in C++ since I suspect many of us interface with C libraries even if we code mostly in C++ ourselves.

Edit: Neil Butterworth posed this question, in followup, that I believe is an interesting corollary to this question.

Community
  • 1
  • 1
dash-tom-bang
  • 17,383
  • 5
  • 46
  • 62
  • 2
    In C++, zero initialization doesn't serve much functional use, since the constructor will ensure that the type is ready to go. – Puppy May 14 '10 at 22:46
  • @DeadMG: What does the constructor of a POD type do? – dash-tom-bang May 14 '10 at 22:54
  • @dash-tom-bang: It does absolutely nothing. – AnT stands with Russia May 14 '10 at 22:58
  • @DeadMG: Just because it is C++ does not mean every type has (or is supposed to have) a constructor. And even it does have a constructor, it does not mean that that constructor is doing anything useful. – AnT stands with Russia May 14 '10 at 23:19
  • @Andrey: that was my point. ;) – dash-tom-bang May 14 '10 at 23:28
  • @AnT - what type doesn't have a constructor? – jterm Aug 03 '18 at 04:42
  • @jterm Any non-class type. `int` for example. – AnT stands with Russia Aug 03 '18 at 05:12
  • @AnT - even POD types have constructors, they simply can not have user-defined ones. – jterm Aug 03 '18 at 21:35
  • @jterm: POD has nothing to do with it. Again, only *class* types have constructors in C++, regardless of whether they are POD or non-POD. *Non-class* types cannot possibly have constructors in C++. – AnT stands with Russia Aug 03 '18 at 21:45
  • @AnT - Yes they absolutely have constructors. What do you think `int val(4);` does? – jterm Aug 08 '18 at 20:45
  • @jterm: No, they don't. This syntax is called *declaration with initializer*. The process triggered by the `(4)` part is called *initialization*. And it is a very popular beginner's mistake to assume that this is a "constructor call". It isn't. *Initialization* is a fairly complex and multi-variant procedure in C++. It works differently for different kinds of types. For class types, for example, it might (but doesn't have to) invoke constructors. For non-class types it works without any constructors, since non-class types have none. Don't get confused by the *syntax*. Syntax means nothing. – AnT stands with Russia Aug 08 '18 at 21:07
  • @AnT - Built-in types have what looks, acts, and feels like a constructor. The minute difference is that there is no function call the memory is just intialized. This is not a "beginner's mistake" as you said. Stroustrup himself refers to built-in types having constructors in his book while still pointing out that the implementation is slightly different than class constructors. I'm not even sure what point you're trying to drive home but for all intents and purposes you can refer to built-in types as having constructors and it is a bit of confusing minutia to tease out the difference. – jterm Aug 09 '18 at 02:55
  • @jterm: Wrong. They don't act like constructors. See for yourself: `int i = int();`. This declaration is guaranteed to initialize `i` with zero. At the surface this looks like a "default constructor" working and that default constructor initializes `int` with zero. However, it this is a default constructor, then it should also do its thing even when we simply declare `int i;`. However, you know perfectly well that a local `int i;` declaration leaves `i` with garbage value. – AnT stands with Russia Aug 09 '18 at 03:12
  • So, if `int` has default constructor, how come it works in `int i = int();` case, but suddenly stops working in `int i;` case? This is just an illustration if the holes in your logic. However, the language specification is very clear and explicit in this regard: only class types have constructors, non-class types don't have constructors. End of debate. What Stroustrup says in his book is incorrect. Initially people assumed that he made a deliberate mistake to simplify the text, but there's some evidence that he's simply not up-to-date with this part of the language specification. – AnT stands with Russia Aug 09 '18 at 03:16
  • And the point I'm trying to drive home is that at "pidgin C++" level the assumption that `int` has constructors is "good enough" for many means and purposes. In that regard it is quite similar to that popular assumption than "arrays are just pointers". Even though these assumptions are wrong, many C++ users out there manage to live and work while being blissfully unaware of that. You can get by writing simplistic C++ programs under these incorrect assumptions and probably never have any issues. But once you get into real, more advanced C++, understanding these nuances becomes very important. – AnT stands with Russia Aug 09 '18 at 03:25

12 Answers12

27

memset is practically never the right way to do it. And yes, there is a practical difference (see below).

In C++ not everything can be initialized with literal 0 (objects of enum types can't be), which is why in C++ the common idiom is

some_struct s = {};

while in C the idiom is

some_struct s = { 0 };

Note, that in C the = { 0 } is what can be called the universal zero initializer. It can be used with objects of virtually any type, since the {}-enclosed initializers are allowed with scalar objects as well

int x = { 0 }; /* legal in C (and in C++) */

which makes the = { 0 } useful in generic type-independent C code (type-independent macros for example).

The drawback of = { 0 } initializer in C89/90 and C++ is that it can only be used as a part of declaration. (C99 fixed this problem by introducing compound literals. Similar functionality is coming to C++ as well.) For this reason you might see many programmers use memset in order to zero something out in the middle of C89/90 or C++ the code. Yet, I'd say that the proper way to do is still without memset but rather with something like

some_struct s;
...
{
  const some_struct ZERO = { 0 };  
  s = ZERO;
}
...

i.e. by introducing a "fictive" block in the middle of the code, even though it might not look too pretty at the first sight. Of course, in C++ there's no need to introduce a block.

As for the practical difference... You might hear some people say that memset will produce the same results in practice, since in practice the physical all-zero bit pattern is what is used to represent zero values for all types. However, this is generally not true. An immediate example that would demonstrate the difference in a typical C++ implementation is a pointer-to-data-member type

struct S;
...

int S::*p = { 0 };
assert(p == NULL); // this assertion is guaranteed to hold

memset(&p, 0, sizeof p);
assert(p == NULL); // this assertion will normally fail

This happens because a typical implementation usually uses the all-one bit pattern (0xFFFF...) to represent the null pointer of this type. The above example demonstrates a real-life practical difference between a zeroing memset and a normal = { 0 } initializer.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 2
    @AndreyT: `"Of course, in C++ there's no need to introduce a block"`, Why? Also, why do you want to introduce the block in C? – Lazer May 15 '10 at 21:01
  • 1
    @eSKay: Because C++ allows one to add declarations in the middle of the code. There's no need for a block. C99 allows that as well, but as I said above in C99 you have a better option: compound literals. The only case that's left uncovered is C89/90. In C89/90 you can't just declare a variable in the middle of the code. You need a block. This is why I want to introduce a block in C (implying C89/90). – AnT stands with Russia May 16 '10 at 02:36
  • @AndreyT: thanks! never knew that "in C89/90 you can't just declare a variable in the middle of the code". But I tested [this code](http://codepad.org/nx3vzW1o) using `gcc -std=c89 check89.c`, and it compiles and runs fine! – Lazer May 16 '10 at 05:38
  • @eSKay: GCC is well-known for quietly taking quite a few liberties with the language. If you want GCC to at least *resemble* standard C, you need to run it with `-ansi -pedantic-errors` settings. Just `-std=c89` is not enough. – AnT stands with Russia May 16 '10 at 07:07
  • @AndreyT: thanks! `error: ISO C90 forbids mixed declarations and code` – Lazer May 16 '10 at 08:38
  • 1
    @AndreyT, interesting how different people interpret answers differently :) I thought the reason you introduced the block is to reduce visibility of `ZERO`, and that the reason you said it's not needed in C++ is because you can say `s = some_struct();` :) To get rid of other interpretational issues, the feature you say is coming to C++ is `s = { }`, right? It's the new unified initializers thing of C++0x. – Johannes Schaub - litb May 16 '10 at 11:23
  • @litb: I have to admit, the `s = some_struct()` variant somehow slipped my mind. Indeed, in C++ one doesn't really need that `ZERO` trick. Reducing visibility of ZERO is certainly important to keep in mind as well. – AnT stands with Russia May 16 '10 at 17:56
  • Interesting, I didn't know that most compilers use the all-ones bit pattern for representing null class member function pointers (though I did know that those had lots of weirdness -- see http://www.codeproject.com/kb/cpp/FastDelegate.aspx ) – Adam Rosenfield Feb 03 '11 at 23:41
  • @Adam Rosenfield: Not member **function** pointers. Null pointer of member function pointer type is usually represented by all-zero pattern, i.e. nothing weird here. It is **data** member pointers that use all-ones pattern for null pointers. – AnT stands with Russia Feb 03 '11 at 23:57
15

some_struct s = { 0 }; is guaranteed to work; memset relies on implementation details and is best avoided.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 6
    If this is about C++, then no, it will *not* work when the first member of `some_struct` is a enum object. This is one of the reasons the `= {}` initializers were introduced in C++. – AnT stands with Russia May 14 '10 at 22:37
6

If the struct contains pointers, the value of all bits zero as produced by memset may not mean the same as assigning a 0 to it in the C (or C++) code, i.e. a NULL pointer.

(It might also be the case with floats and doubles, but that I've never encountered. However, I don't think the standards guarantee them to become zero with memset either.)

Edit: From a more pragmatic perspective, I'd still say to not use memset when possible to avoid, as it is an additional function call, longer to write, and (in my opinion) less clear in intent than = { 0 }.

Arkku
  • 41,011
  • 10
  • 62
  • 84
  • 2
    The C and C++ standards don't require all bits zero to turn into 0.0 as a floating point number, but the IEEE standards do. There are machines that don't follow the IEEE standards, but all of them of which I'm aware convert all bits zero to 0.0 anyway. – Jerry Coffin May 14 '10 at 21:57
  • 1
    The question asked for *practical* differences. Practically speaking, null pointers are always all-bits-zero, and so are the zero values of the floating-point types. – Rob Kennedy May 14 '10 at 21:57
  • On the platforms with which I am familiar, all use all-bits-zero to mean NULL, 0, and 0.0, but your point is well taken. – dash-tom-bang May 14 '10 at 22:00
  • 5
    @Rob Kennedy: Well, there are machines with non-zero representations for null pointers so it is a practical difference on those machines. Of course, we may consider such machines impractical. =) – Arkku May 14 '10 at 22:04
  • 5
    @Rob Kennedy: Incorrect. Pointers-to-data-members, like `int S::*p` for example usually *never* use all-zero bit pattern for null pointers. They use all-one bit pattern instead (0xFFF...), which is one *practical* example when you will not obtain a null-pointer with `memset(..., 0, ...)`. The funny part is that this is the case on our everyday machines and implementations: G++, MSVC++. It's been living under our noses all this time. Nothing exotic about it. – AnT stands with Russia May 14 '10 at 22:42
5

Depending on the compiler optimization, there may be some threshold above which memset is faster, but that would usually be well above the normal size of stack based variables. Using memset on a C++ object with a virtual table is of course bad.

drawnonward
  • 53,459
  • 16
  • 107
  • 112
  • Using `memset` on any non-POD is bad. Also, if `memset` is faster for initializing something, I don't see why the compiler wouldn't then optimize `={}` to such a thing. – GManNickG May 14 '10 at 22:11
4

I found a good solution to be:

template<typename T> void my_zero(T& e) {
    static T dummy_zero_object;
    e = dummy_zero_object;
}

my_zero(s);

This does the right thing not only for fundamental types and user-defined types, but it also zero-initializes types for which the default constructor is defined but does not initialize all member variables --- especially classes containing non-trivial union members.

Hugues
  • 2,865
  • 1
  • 27
  • 39
3

The only practical difference is that the ={0}; syntax is a bit clearer about saying "initialize this to be empty" (at least it seems clearer to me).

Purely theoretically, there are a few situations in which memset could fail, but as far as I know, they really are just that: theoretical. OTOH, given that it's inferior from both a theoretical and a practical viewpoint, I have a hard time figuring out why anybody would want to use memset for this task.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • `memset` fails over `={}` in real environments. (I don't know what the names are, but I know they exist. And they are modern.) – GManNickG May 14 '10 at 22:10
  • @GMan: I'm aware of a few environments in which the usual representation of a null pointer does not have all bits zero -- but at least in those of which I'm aware, all-bits-zero *does* create a null pointer as well. Of course, there are machines I haven't used. As I said, even though I can't point at it failing, I can't conceive of a good reason to use `memset` either. – Jerry Coffin May 14 '10 at 22:22
  • 2
    You haven't been looking too carefully. Virtually all popular C++ implementations (G++ included, for example) use **all-ones** bit pattern for null pointers of *pointer-to-data-member* types in C++. Like `int S::*p` for some class type `S`. You will not obtain a null pointer by doing a `memset(&p, 0, sizeof p)` with such a pointer. – AnT stands with Russia May 14 '10 at 22:46
  • There is no guarantee (at least in C) that a floating point value of 0.0 has to be represented by all bits zero with in a float or double. – Randy Howard Sep 19 '13 at 19:10
  • @RandyHoward: Yes, thus the emphasis on "practical" -- even though all bits 0 isn't guaranteed to give a value of 0.0, finding real hardware for which all bits zero produces a non-zero floating point value is fairly difficult (to put it mildly). – Jerry Coffin Sep 19 '13 at 19:16
3

Hopefully it is understood that this is only currently available for POD structures; you'd get a compiler error if there was a C++ std::string in that structure.

No you won't. If you use memset on such, at the best you will just crash, and at the worst you get some gibberish. The = { } way can be used perfectly fine on non-POD structs, as long as they are aggregates. The = { } way is the best way to take in C++. Please note that there is no reason in C++ to put that 0 in it, nor is it recommended, since it drastically reduces the cases in which it can be used

struct A {
  std::string a;
  int b;
};

int main() {
  A a = { 0 };
  A a = { };
}

The first will not do what you want: It will try to create a std::string from a C-string given a null pointer to its constructor. The second, however, does what you want: It creates an empty string.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • Ahh fair enough; I shall delete that edit. Letting my memory of "non-aggregates cannot be initialized with initializer list" errors cloud my thinking... Of course, any object which has a single argument (non-explicit?) ctor will have this behavior. In the case of std::string, happily doing string ops from NULL. – dash-tom-bang May 14 '10 at 23:03
2

I've never understood the mysterious goodness of setting everything to zero, which even if it is defined seems unlikely to be desirable. As this is tagged as C++, the correct solution to initialisation is to give the struct or class a construtor.

  • 2
    Seems like a tough row to hoe when you're dealing with 3rd-party libraries written in C, to which you don't have the source code. – dash-tom-bang May 14 '10 at 22:12
  • It's less arbitrary if your struct contains pointers owned by the struct, since then a cleanup function can call `free` (or `delete`) unconditionally. – jamesdlin May 14 '10 at 22:56
2

I think the initialization speaks much clearer what you actually are doing. You are initializing the struct. When the new standard is out that way of initializing will get even more used (initializing containers with {} is something to look forward to). The memset way are slightly more error prone, and does not communicate that clearly what you are doing. That might not account for much while programming alone, but means a great deal when working in a team.

For some people working with c++, memset, malloc & co. are quite esoteric creatures. I have encountered a few myself.

daramarak
  • 6,115
  • 1
  • 31
  • 50
  • I agree- I can't wait, with the asterisk that I'm worried about what the compiler vendors drop on us when that happens. (Speaking as someone who works on "esoteric" platforms with a small market for compiler vendors.) – dash-tom-bang May 14 '10 at 22:14
2

The best method for clearing structures is to set each field individually:

struct MyStruct
{
  std::string name;
  int age;
  double checking_account_balance;
  void clear(void)
  {
     name.erase();
     age = 0;
     checking_account_balance = 0.0;
  }
};

In the above example, a clear method is defined to set all the members to a known state or value. The memset and std::fill methods may not work due to std::string and double types. A more robust program clears each field individually.

I prefer having a more robust program than spending less time typing.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
0

The bzero function is another option.

#include <strings.h>
void bzero(void *s, size_t n);
maerics
  • 151,642
  • 46
  • 269
  • 291
  • 2
    The `bzero` is not standard and may not be available on all platforms. From Harbison & Steele: "The more restricted function `bzero` ... is found in some UNIX implementations." It doesn't state that the function is available in all implementations. – Thomas Matthews May 14 '10 at 22:07
0

In C I prefer using {0,} to the equivalent memset(). However gcc warns about this usage :( Details here: http://www.pixelbeat.org/programming/gcc/auto_init.html

In C++ they're usually equivalent, but as always with C++ there are corner cases to consider (noted in other answers).

pixelbeat
  • 30,615
  • 9
  • 51
  • 60