32

gcc 4.4.4 C89

I am just wondering what most C programmers do when they want to zero out memory.

For example, I have a buffer of 1024 bytes. Sometimes I do this:

char buffer[1024] = {0};

Which will zero all bytes.

However, should I declare it like this and use memset?

char buffer[1024];
.
.
memset(buffer, 0, sizeof(buffer));

Is there any real reason you have to zero the memory? What is the worst that can happen by not doing it?

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
ant2009
  • 27,094
  • 154
  • 411
  • 609
  • 26
    You're the one who knows whether you need to zero the memory. Does your code expect zeroes or not? – Cascabel May 21 '10 at 17:38
  • 8
    For people who think the first will not zero out all elements: read the spec. – Mehrdad Afshari May 21 '10 at 17:56
  • Judging by the confusion evident in the answers, I would suggest that memset is the more *maintainable* option, even if both are equivalent. – Paul Nathan May 21 '10 at 18:02
  • I hope you understand that what "most programmers do" is not necessarily the right way to do it. The presence of archaic, deprecated and plainly incorrect techniques is still very much noticeable these days. Using `memset` and/or relying on `calloc` zeroing is almost always a bad programming practice, but you will still see it used pretty often. Most of the time because the author of the code simply didn't know how to do it properly. – AnT stands with Russia May 21 '10 at 18:03
  • 4
    Wow, that's a lot of upvoted wrong answers! – avakar May 21 '10 at 18:07
  • 4
    @Andrey, what exactly is the problem with `calloc`? – JSBձոգչ May 21 '10 at 18:12
  • 3
    @Andrey since when did calloc and memset become "archaic and deprecated"!? Please post that as an answer so I can downvote it :) – Earlz May 21 '10 at 18:26
  • @JS Bangs: The problem with `calloc` is the same as the problem with `malloc`: using it for zeroing any types besides integral ones is a dubious programming practice (to put it mildly). – AnT stands with Russia May 21 '10 at 18:36
  • @Earlz: Where did I specifically call `memset` and `calloc` archaic and deprecated? Now, using `memset` and `calloc` for general value-zeroing is indeed archaic. The became archaic in that role in 1989/1990 when ANSI C came into existence. – AnT stands with Russia May 21 '10 at 18:38
  • @Earlz: As for downvoting... The issue of using `memset` and `calloc` has been discussed and explained very well here on SO. So I suggest you do some search and learn a thing or two about C language, before you start going around downvoting something you have no idea about. I did a couple of answers on the topic myself, so you can downvote those, if you wish :) – AnT stands with Russia May 21 '10 at 19:03
  • @JS Bangs: I meant to say: "The problem with `calloc` is the same as the problem with `memset`...". Sorry for the typo. – AnT stands with Russia May 21 '10 at 20:39
  • @AnT, I did a little digging through SO, and found your answer to the subject, but can you make an example of data structure in C which will be initialized wrongly? – 0andriy Feb 20 '15 at 20:39

12 Answers12

20

The worst that can happen? You end up (unwittingly) with a string that is not NULL terminated, or an integer that inherits whatever happened to be to the right of it after you printed to part of the buffer. Yet, unterminated strings can happen other ways, too, even if you initialized the buffer.

Edit (from comments) The end of the world is also a remote possibility, depending on what you are doing.

Either is undesirable. However, unless completely eschewing dynamically allocated memory, most statically allocated buffers are typically rather small, which makes memset() relatively cheap. In fact, much cheaper than most calls to calloc() for dynamic blocks, which tend to be bigger than ~2k.

c99 contains language regarding default initialization values, I can't, however, seem to make gcc -std=c99 agree with that, using any kind of storage.

Still, with a lot of older compilers (and compilers that aren't quite c99) still in use, I prefer to just use memset()

Tim Post
  • 33,371
  • 15
  • 110
  • 174
  • 14
    "What's the worst that can happen"? A non-null-terminated string or buffer overflow can overwrite key data, smash your stack pointers, lead to security holes, get you fired, eat your children, cause famine in Africa, incite nuclear war, and summon the dread Cthulhu. The wise programmer protects himself from them with every weapon he has. Also, I'm pretty sure that no commercial C compiler actually implements that part of the spec except maybe in debug builds. Uninitialized variables get uninitialized memory. – JSBձոգչ May 21 '10 at 21:29
  • 2
    @JS Bangs - I can assure you that my children and warheads will not suffer from this fate. On an unrelated note, how much coffee or overly caffeinated drinks (in liters or gallons) have you consumed in the last 24 hours? – Tim Post May 21 '10 at 21:37
  • @JS Bangs can you define 'unlimited' ? especially for static allocation? – Tim Post May 21 '10 at 22:12
  • @Tim: My copy of C99 has no section 8.5.1 - can you post a snippet of what it might say about initializing to something consistent? As far as I know, only statically allocated objects get initialized to zero; automatic variables (most locals) do not get the promise of any kind of initialization in C unless explicitly initialized by the programmer. – Michael Burr May 22 '10 at 07:04
  • @Michael Burr - I misquoted. I'll find the exact passage and edit again once I have. I _know_ I read that uninitialized storage (it could have been specific to static) shall be initialized by the compiler in a consistent manner. 8.5.1 does deal with initialization of members, see http://gcc.gnu.org/ml/gcc-bugs/2000-07/msg00639.html , but I can't find the reference that I'm looking for. – Tim Post May 22 '10 at 08:14
  • @Tim: the 8.5.1 reference is from the C++ standard (I should have guessed). As far as C99 goes, 6.7.8/10 (Initialization) says, "If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate". That part of the standard then goes on to detail how static duration objects without explicit initialization are initialized (which is basically to set pointers to the null pointer value and zero initialize everything else). – Michael Burr May 22 '10 at 15:10
  • memset is not needed if you do this on global variables (let's leave beside the topic about how bad / good to have global variables). – 0andriy Feb 20 '15 at 20:17
14

I vastly prefer

char buffer[1024] = { 0 };

It's shorter, easier to read, and less error-prone. Only use memset on dynamically-allocated buffers, and then prefer calloc.

JSBձոգչ
  • 40,684
  • 18
  • 101
  • 169
  • 1
    It's shorter, easier to type, and less prone to error. However, it isn't as obvious if you forget the rules for non-explicitly initialized elements of arrays (I admit being one of the guilty). So the alternative might introduce error if you pick the wrong bounds, but even a person who hasn't had his morning coffee won't be led astray by what a memset() is doing. Does this mean coding to a lower denominator of programmer? Perhaps, but my defensive style never allows for much "default" initialization. If you don't use default initialization, you forget the rules. – Edwin Buck May 21 '10 at 18:13
  • @Edwin Buck: If you forget the rules for non-explicitly initialized elements, what's the worst that will happen? This isn't really a downside. – jamesdlin May 21 '10 at 20:53
  • 1
    The worst that will happen is that someone else will mess with working (and correct) code thinking that only the first element is being initialized. It's not that the code will contain any issues as you write it, it's that the "other" person might mess it up trying to "help" you in fixing something that's not broken. – Edwin Buck May 21 '10 at 21:22
  • 3
    @Edwin Buck: And even then, it's likely that the other programmer will write a correct version with `memset`, which is likely to be fine anyway. Futhermore, other programmers in general shouldn't change things that aren't broken, and if *you* don't use it in your code, then you're perpetuating the problem of other people not being as familiar with the idiom. – jamesdlin May 22 '10 at 05:32
  • 3
    The rule is very simple to remember, once you know it: "Objects are never partially initialised in C". So anything (array, struct, etc) is either completely initialised, or not initialised at all. – caf May 22 '10 at 09:39
  • It makes a lot more sense to write "char buffer[1024] = {}" - then there is less confusion, everything will still be initialized, but no one will think that only the first element is initialized. – Stefan Monov May 22 '10 at 16:48
  • @Stefan Monov: I think that's legal only in C++, not in standard C. – jamesdlin May 22 '10 at 17:55
13

When you define char buffer[1024] without initializing, you're going to get undefined data in it. For instance, Visual C++ in debug mode will initialize with 0xcd. In Release mode, it will simply allocate the memory and not care what happens to be in that block from previous use.

Also, your examples demonstrate runtime vs. compile time initialization. If your char buffer[1024] = { 0 } is a global or static declaration, it will be stored in the binary's data segment with its initialized data, thus increasing your binary size by about 1024 bytes (in this case). If the definition is in a function, it's stored on the stack and is allocated at runtime and not stored in the binary. If you provide an initializer in this case, the initializer is stored in the binary and an equivalent of a memcpy() is done to initialize buffer at runtime.

Hopefully, this helps you decide which method works best for you.

spoulson
  • 21,335
  • 15
  • 77
  • 102
8

In this particular case, there's not much difference. I prefer = { 0 } over memset because memset is more error-prone:

  • It provides an opportunity to get the bounds wrong.
  • It provides an opportunity to mix up the arguments to memset (e.g. memset(buf, sizeof buf, 0) instead of memset(buf, 0, sizeof buf).

In general, = { 0 } is better for initializing structs too. It effectively initializes all members as if you had written = 0 to initialize each. This means that pointer members are guaranteed to be initialized to the null pointer (which might not be all-bits-zero, and all-bits-zero is what you'd get if you had used memset).

On the other hand, = { 0 } can leave padding bits in a struct as garbage, so it might not be appropriate if you plan to use memcmp to compare them later.

jamesdlin
  • 81,374
  • 13
  • 159
  • 204
4

The worst that can happen by not doing it is that you write some data in character by character and later interpret it as a string (and you didn't write a null terminator). Or you end up failing to realise a section of it was uninitialised and read it as though it were valid data. Basically: all sorts of nastiness.

Memset should be fine (provided you correct the sizeof typo :-)). I prefer that to your first example because I think it's clearer.

For dynamically allocated memory, I use calloc rather than malloc and memset.

Vicky
  • 12,934
  • 4
  • 46
  • 54
4

One of the things that can happen if you don't initialize is that you run the risk of leaking sensitive information.

Uninitialized memory may have something sensitive in it from a previous use of that memory. Maybe a password or crypto key or part of a private email. Your code may later transmit that buffer or struct somewhere, or write it to disk, and if you only partially filled it the rest of it still contains those previous contents. Certain secure systems require zeroizing buffers when an address space can contain sensitive information.

progrmr
  • 75,956
  • 16
  • 112
  • 147
3

I prefer using memset to clear a chunk of memory, especially when working with strings. I want to know without a doubt that there will be a null delimiter after my string. Yes, I know you can append a \0 on the end of each string and some functions do this for you, but I want no doubt that this has taken place.

A function could fail when using your buffer, and the buffer remains unchanged. Would you rather have a buffer of unknown garbage, or nothing?

  • 1
    The first example is not compiler dependent - it's standard C. – Vicky May 21 '10 at 17:45
  • @Vicky - Thanks for the correction, I wasn't 100% sure on the first example. –  May 21 '10 at 17:48
  • Anonymous downvoter, if something I've said in my answer is incorrect and not what I have already rectified, I would really like to know what it is. –  May 21 '10 at 17:58
  • Is there a reason you prefer `memset` to `={0}`? – avakar May 21 '10 at 18:02
  • @avakar - To be honest, I wasn't fully aware of what `={0}` did. Learning C in college, this syntax was never used and never taught, we were always taught to always use `memset`. After reading the responses here, I have no logical reason to use one over the other, and may start using the first one to avoid a call to `memset`. Not sure if that would be any less expensive... –  May 21 '10 at 18:16
  • 2
    It is not about "expensive"/"inexpensive". It is about the simple fact that `memset` is a generally *invalid* approach, while `= { 0 }` is always valid. – AnT stands with Russia May 21 '10 at 19:16
2

Depends how you're filling it: if you're planning on writing to it before even potentially reading anything, then why bother? It also depends what you're going to use the buffer for: if it's going to be treated as a string, then you just need to set the first byte to \0:

char buffer[1024];
buffer[0] = '\0';

However, if you're using it as a byte stream, then the contents of the entire array are probably going to be relevant, so memseting the entire thing or setting it to { 0 } as in your example is a smart move.

Samir Talwar
  • 14,220
  • 3
  • 41
  • 65
2

This post has been heavily edited to make it correct. Many thanks to Tyler McHenery for pointing out what I missed.

char buffer[1024] = {0};

Will set the first char in the buffer to null, and the compiler will then expand all non-initialized chars to 0 too. In such a case it seems that the differences between the two techniques boil down to whether the compiler generates more optimized code for array initialization or whether memset is optimized faster than the generated compiled code.

Previously I stated:

char buffer[1024] = {0};

Will set the first char in the buffer to null. That technique is commonly used for null terminated strings, as all data past the first null is ignored by subsequent (non-buggy) functions that handle null terminated strings.

Which is not quite true. Sorry for the miscommunication, and thanks again for the corrections.

Community
  • 1
  • 1
Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • 7
    `char buffer[1024] = {0}` does indeed set *all* elements to `\0`. There's a rule where if you give an array an initializer but the initializer is shorter than the array, the rest of the array is set to zero bytes. – Tyler McHenry May 21 '10 at 17:45
  • 1
    The first one will set the whole buffer to 0 too, not just the first char. – Vicky May 21 '10 at 17:46
  • 2
    Careful with your vocabulary. `NULL` is not necessarily equal to 0. – mbauman May 22 '10 at 05:11
1

I also use memset(buffer, 0, sizeof(buffer));

The risk of not using it is that there is no guarantee that the buffer you are using is completely empty, there might be garbage which may lead to unpredictable behavior.

Always memset-ing to 0 after malloc, is a very good practice.

LoudNPossiblyWrong
  • 3,855
  • 7
  • 33
  • 45
1

yup, calloc() method defined in stdlib.h allocates memory initialized with zeros.

Adil Mehmood
  • 462
  • 3
  • 13
-2

I'm not familiar with the:

char buffer[1024] = {0};

technique. But assuming it does what I think it does, there's a (potential) difference to the two techniques.

The first one is done at COMPILE time, and the buffer will be part of the static image of the executable, and thus be 0's when you load.

The latter will be done at RUN TIME.

The first may incur some load time behaviour. If you just have:

char buffer[1024];

the modern loaders may well "virtually" load that...that is, it won't take any real space in the file, it'll simply be an instruction to the loader to carve out a block when the program is loaded. I'm not comfortable enough with modern loaders say if that's true or not.

But if you pre-initialize it, then that will certainly need to be loaded from the executable.

Mind, neither of these have "real" performance impacts in the small. They may not have any in the "large". Just saying there's potential here, and the two techniques are in fact doing something quite different.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
  • This isn't true. `char buffer[1024] = { 0 };` is allocated on the stack at runtime. The compiler may even translate this into a call to `memset`. – JSBձոգչ May 21 '10 at 17:47
  • 1
    `char buffer[1024] = {0}` can only potentially be done at compile time if `buffer` is global or static (and then it would be done automatically even if you left off the initializer). If `buffer` is a local variable, it's on the stack, which means that it must be initialized to zero at run-time, since the contents of the stack at the time of a function call are undetermined until it actually happens. – Tyler McHenry May 21 '10 at 17:48