8

malloc is not guaranteed to return 0'ed memory. The conventional wisdom is not only that, but that the contents of the memory malloc returns are actually non-deterministic, e.g. openssl used them for extra randomness.

However, as far as I know, malloc is built on top of brk/sbrk, which do "return" 0'ed memory. I can see why the contents of what malloc returns may be non-0, e.g. from previously free'd memory, but why would they be non-deterministic in "normal" single-threaded software?

  1. Is the conventional wisdom really true (assuming the same binary and libraries)
  2. If so, Why?

Edit Several people answered explaining why the memory can be non-0, which I already explained in the question above. What I'm asking is why the program using the contents of what malloc returns may be non-deterministic, i.e. why it could have different behavior every time it's run (assuming the same binary and libraries). Non-deterministic behavior is not implied by non-0's. To put it differently: why it could have different contents every time the binary is run.

Oleg2718281828
  • 1,039
  • 7
  • 17

10 Answers10

11

Malloc does not guarantee unpredictability... it just doesn't guarantee predictability.

E.g. Consider that

 return 0;

Is a valid implementation of malloc.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    Mehrdad, my question is about where the non-determinism would come from, rather than whether it's guaranteed. – Oleg2718281828 Jun 28 '12 at 19:42
  • Right, and my answer is that on your implementation, it might not come from anywhere at all. The C standard isn't just for your implementation. – user541686 Jun 28 '12 at 19:43
  • Mehrdad, You are right, I could have been clearer here: "In cases when the contents of malloc are non-deterministic, say on Linux, where does this non-determinism come from?", but this doesn't answer my question. – Oleg2718281828 Jun 28 '12 at 20:07
  • @Oleg2718281828: Not that it makes much of a difference in the answer (i.e. it's just that, for whatever reason, the developers *did not think it was a good idea* to guarantee determinism), but I'd be interested in knowing where it says the contents of `malloc` are nondeterministic *"on Linux"* as you mentioned. You need to realize that *just because a fact is true doesn't mean someone has to guarantee that it stay so*. – user541686 Jun 28 '12 at 20:11
  • *just because a fact is true doesn't mean someone has to guarantee that it stay so*. I realize that perfectly. I'm asking why "the fact may be true (when it is)", rather than whether it's guaranteed. – Oleg2718281828 Jun 28 '12 at 20:19
  • @Oleg2718281828: Sorry, either I'm not understanding what you're saying or you're not understanding what I'm saying, but I have no idea how else to explain it. :\ – user541686 Jun 28 '12 at 20:22
  • Parable example: *Newton*: why do apples fall down (when they do), could it be gravity? *Mehrdad*: They are not guaranteed to fall down, they could just as easily be eaten first (To be sure, I think it's useful info, for those who don't know it, but it just doesn't answer the question) – Oleg2718281828 Jun 28 '12 at 20:39
  • @Oleg2718281828: Uhm, Newton was busy making discoveries. You're merely looking at something another mortal already invented, and trying to explain things that were specifically not guaranteed to be in any particular way. There's a crucial difference here. :-) – user541686 Jun 28 '12 at 20:41
4

The initial values of memory returned by malloc are unspecified, which means that the specifications of the C and C++ languages put no restrictions on what values can be handed back. This makes the language easier to implement on a variety of platforms. While it might be true that in Linux malloc is implemented with brk and sbrk and the memory should be zeroed (I'm not even sure that this is necessarily true, by the way), on other platforms, perhaps an embedded platform, there's no reason that this would have to be the case. For example, an embedded device might not want to zero the memory, since doing so costs CPU cycles and thus power and time. Also, in the interest of efficiency, for example, the memory allocator could recycle blocks that had previously been freed without zeroing them out first. This means that even if the memory from the OS is initially zeroed out, the memory from malloc needn't be.

The conventional wisdom that the values are nondeterministic is probably a good one because it forces you to realize that any memory you get back might have garbage data in it that could crash your program. That said, you should not assume that the values are truly random. You should, however, realize that the values handed back are not magically going to be what you want. You are responsible for setting them up correctly. Assuming the values are truly random is a Really Bad Idea, since there is nothing at all to suggest that they would be.

If you want memory that is guaranteed to be zeroed out, use calloc instead.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
3

malloc is defined on many systems that can be programmed in C/C++, including many non-UNIX systems, and many systems that lack operating system altogether. Requiring malloc to zero out the memory goes against C's philosophy of saving CPU as much as possible.

The standard provides a zeroing cal calloc that can be used if you need to zero out the memory. But in cases when you are planning to initialize the memory yourself as soon as you get it, the CPU cycles spent making sure the block is zeroed out are a waste; C standard aims to avoid this waste as much as possible, often at the expense of predictability.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
3

Memory returned by mallocis not zeroed (or rather, is not guaranteed to be zeroed) because it does not need to. There is no security risk in reusing uninitialized memory pulled from your own process' address space or page pool. You already know it's there, and you already know the contents. There is also no issue with the contents in a practical sense, because you're going to overwrite it anyway.

Incidentially, the memory returned by malloc is zeroed upon first allocation, because an operating system kernel cannot afford the risk of giving one process data that another process owned previously. Therefore, when the OS faults in a new page, it only ever provides one that has been zeroed. However, this is totally unrelated to malloc.

(Slightly off-topic: The Debian security thing you mentioned had a few more implications than using uninitialized memory for randomness. A packager who was not familiar with the inner workings of the code and did not know the precise implications patched out a couple of places that Valgrind had reported, presumably with good intent but to desastrous effect. Among these was the "random from uninitilized memory", but it was by far not the most severe one.)

Damon
  • 67,688
  • 20
  • 135
  • 185
2

I think that the assumption that it is non-deterministic is plain wrong, particularly as you ask for a non-threaded context. (In a threaded context due to scheduling alea you could have some non-determinism).

Just try it out. Create a sequential, deterministic application that

  • does a whole bunch of allocations
  • fills the memory with some pattern, eg fill it with the value of a counter
  • free every second of these allocations
  • newly allocate the same amount
  • run through these new allocations and register the value of the first byte in a file (as textual numbers one per line)

run this program twice and register the result in two different files. My idea is that these files will be identical.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • that's what I would expect as well: while you don't know what the program will output, it would output the same thing every time (assuming the same binary/libraries/system), therefore it will be deterministic. I'm going to accept this answer, at least for now. – Oleg2718281828 Jun 28 '12 at 21:09
  • It's a slightly different topic (undefined behavior), but the reasoning applies here as well (see the basketball analogy): http://blog.regehr.org/archives/213 – user541686 Jun 29 '12 at 04:21
1

Even in "normal" single-threaded programs, memory is freed and reallocated many times. Malloc will return to you memory that you had used before.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
1

Even single-threaded code may do malloc then free then malloc and get back previously used, non-zero memory.

Alan Stokes
  • 18,815
  • 3
  • 45
  • 64
1

There is no guarantee that brk/sbrk return 0ed-out data; this is an implementation detail. It is generally a good idea for an OS to do that to reduce the possibility that sensitive information from one process finds its way into another process, but nothing in the specification says that it will be the case.

Also, the fact that malloc is implemented on top of brk/sbrk is also implementation-dependent, and can even vary based on the size of the allocation; for example, large allocations on Linux have traditionally used mmap on /dev/zero instead.

Basically, you can neither rely on malloc()ed regions containing garbage nor on it being all-0, and no program should assume one way or the other about it.

fluffy
  • 5,212
  • 2
  • 37
  • 67
0

The simplest way I can think of putting the answer is like this:

If I am looking for wall space to paint a mural, I don't care whether it is white or covered with old graffiti, since I'm going to prime it and paint over it. I only care whether I have enough square footage to accommodate the picture, and I care that I'm not painting over an area that belongs to someone else.

That is how malloc thinks. Zeroing memory every time a process ends would be wasted computational effort. It would be like re-priming the wall every time you finish painting.

matchdav
  • 715
  • 1
  • 7
  • 16
-1

There is an whole ecosystem of programs living inside a computer memmory and you cannot control the order in which mallocs and frees are happening.

Imagine that the first time you run your application and malloc() something, it gives you an address with some garbage. Then your program shuts down, your OS marks that area as free. Another program takes it with another malloc(), writes a lot of stuff and then leaves. You run your program again, it might happen that malloc() gives you the same address, but now there's different garbage there, that the previous program might have written.

I don't actually know the implementation of malloc() in any system and I don't know if it implements any kind of security measure (like randomizing the returned address), but I don't think so.

It is very deterministic.

Leonardo
  • 1,834
  • 1
  • 17
  • 23
  • Operating systems basically never implement malloc. The C runtime library does that using some form of heap management on top of operating system functions. For common contemporary operating systems the functions used hand back virtual memory and for security reason that is almost always guaranteed to be zero'ed. – Robin Caron Jul 05 '12 at 04:03