26

Is malloc deterministic? Say If I have a forked process, that is, a replica of another process, and at some point both of them call the malloc function. Would the address allocated be the same in both processes? Assuming that other parts of execution are also deterministic.

Note: Here, I'm only talking about virtual memory, not physical one.

Jens
  • 69,818
  • 15
  • 125
  • 179
MetallicPriest
  • 29,191
  • 52
  • 200
  • 356

7 Answers7

27

There is no reason at all for it to be deterministic, in fact there can be some benefit to it not being deterministic, for example increasing the complexity of exploiting bugs (see also this paper).

This randomness can be helpful at making exploits harder to write. To successfully exploit a buffer overflow you typically need to do two things:

  1. Deliver a payload into a predictable/known memory location
  2. Cause execution to jump to that location

If the memory location is unpredictable making that jump can become quite a lot harder.

The relevant quote from the standard §7.20.3.3/2:

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate

If it were the intention to make it deterministic then that would be clearly stated as such.

Even if it looks deterministic today I wouldn't bet on it remaining so with a newer kernel or a newer libc/GCC version.

Flexo
  • 87,323
  • 22
  • 191
  • 272
  • I don't think security has much relevance to the question. Nevertheless, what you wrote about exploiting is correct. – jweyrich Nov 17 '11 at 17:00
  • 3
    @jweyrich - the C standard explicitly states that it's not deterministic. The fact that it's not deterministic can be useful for implementations in a variety of ways - I used address space randomisation as an obvious modern example, but there are also other less obvious reasons (implementations where there is no such thing as virtual memory springs to mind). – Flexo Nov 17 '11 at 17:05
  • I didn't say otherwise. Your answer is ok, though IMO quoting that part of the standard would be the definitive answer. – jweyrich Nov 17 '11 at 17:13
  • When a multithreaded process `fork`s, the newly created process only has a single thread. I could think of a `malloc` implementation that serves memory from per-thread pools preferably, thus avoiding locks. Such an implementation would merge the pools in the `fork`ed process. – Simon Richter Nov 17 '11 at 17:45
  • 1
    @SimonRichter - I have to admit I'd always assumed `fork()` copied all threads, but you're right and the spec is very clear on that too - http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html - `malloc()` isn't async-safe though if I remember correctly which means you can't legally call it in a child after `fork()` though, since it seems to impose the async-safe requirement on the child after a multithreaded `fork()`. – Flexo Nov 17 '11 at 17:53
  • 1
    malloc isn't safe after `vfork` -- but after a `fork` is completely fine. – Simon Richter Nov 17 '11 at 21:09
  • @SimonRichter - from the `fork()` doc I linked to: *"If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."* - am I missing something there then? – Flexo Nov 18 '11 at 10:38
  • This is actually a good point. If the `fork()` happens while another thread is in `malloc()`, the state of the allocator will be invalid, and the thread that would clean it up does not exist in the new process. So you are correct, `malloc()` after `fork()` is unsafe. – Simon Richter Nov 18 '11 at 10:52
  • `malloc` after `fork` is safe if you can ensure that the process is not multi-threaded, but I see no way to do that. There seems to be no guarantee that a POSIX implementation does not use threads "behind the scenes" as part of implementing some other interfaces... – R.. GitHub STOP HELPING ICE Nov 18 '11 at 16:27
11

The C99 spec (at least, in its final public draft) states in 'J.1 Unspecified behavior':

The following are unspecified: ... The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions (7.20.3).

So it would seem that malloc doesn't have to be deterministic. It therefore isn't safe to assume that it is.

Tommy
  • 99,986
  • 12
  • 185
  • 204
  • 3
    I don't think you're wrong, but that quotation speaks strictly about contiguity of storage, which has no relation to determinism. An implementation could be deterministic and yet not allocate contiguous memory in successive calls. Or am I wrong? – jweyrich Nov 17 '11 at 17:06
  • jweyrich, I agree with you. Determinism has nothing to do with order and contiguity of storage allocated. – MetallicPriest Nov 17 '11 at 17:26
  • 3
    It speaks about both contiguity and order. So my reading is that the spec specifies no required behaviour for the order of returned storage. So, amongst other things, it doesn't specify that it must be deterministic. A valid implementation therefore may or may not be deterministic. – Tommy Nov 17 '11 at 17:38
7

That depends entirely on the malloc implementation. There's no inherent reason why a particular malloc implementation would introduce non-determinism (except possibly as an application fuzzing test, but even then it ought to be disabled by default). For example, Doug Lea's malloc does not use rand(3) or any similar methods in it.

But, since malloc makes calls to the kernel such as sbrk(2) or mmap(2) on Linux or VirtualAlloc on Windows, those system calls may not always be deterministic, even in otherwise identical processes. The kernel may decide to intentionally provide different mmap'ed addresses in different processes for whatever reason.

So for small allocations, which are usually serviced in user space without a system call, it will quite likely be the case that the resulting pointers will be the same after a fork(); large allocations that are serviced by a system a call can be the same.

In general, though, do not depend on it. If you really need identical pointers in separate processes, either create them before forking, or use shared memory and share them appropriately.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
2

It depends on the detailed implementations of malloc. A typical malloc implementation (e.g., dlmalloc) used to be deterministic. This is simply because the algorithm itself is deterministic.

However, due to many security attacks such as heap overflow attacks, malloc, that is a heap manager, introduced some randomness in their implementations. (But, its entropy is relatively small because heap managers must consider speed and space) So, it is safe that you should not assume rigorous determinism in a heap managers.

Also, when you fork a process, there are various sources of randomness including ASLR.

minjang
  • 8,860
  • 9
  • 42
  • 61
2

Yes, it's deterministic to some degree, but not that doesn't necessarily mean it'll given identical results in two forks of a process.

Just for example, the Single Unix Specification says: "[...] to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."

For better or worse, malloc is not in the list of "async-signal-safe" functions.

This limitation is in a section that discusses multithreaded programs, but doesn't specify whether the limitation applies only to multithreaded programs, or also applies to single threaded programs.

Conclusion: you can't count on malloc producing identical results in the parent and the child. If the program is multithreaded, you can't count on malloc working at all in the child, until it has called exec--and there's room for reasonable question whether it's actually guaranteed to work even in a single-threaded child before the child calls exec.

References:

  1. fork specification
  2. async-signal safe functions
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • [POSIX `fork(2)`](http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html) *does* reproduce the entire virtual address space perfectly. All pointers are still valid when `fork()` returns, in both the parent and child. (This of course requires virtual memory to have two processes using the same virtual addresses, and it makes an efficient copy-on-write implementation possible.) Last I read, Windows doesn't natively provide a fork() system call. cygwin has to emulate it, and it's not easy. So if you're used to Windows system calls, fork() will seem weird I guess. – Peter Cordes Sep 12 '17 at 06:24
  • Yeah, `mmap` is non-deterministic of course, so the answer to the whole question is a definite "no". I was commenting specifically on your claim that "fork would have to reproduce the heap (including all free blocks) identically.", which it does, so that's not the obstacle. – Peter Cordes Sep 12 '17 at 07:09
  • "Despite being a fork, the clone of a process may have its address space laid out considerably differently from the original." is wrong. Anyway, there are correct answers already posted, so I'd suggest deleting this wrong one you posted 6 years ago :P – Peter Cordes Sep 12 '17 at 07:10
  • @PeterCordes: I prefer to improve posts, especially since none of them was previously very definitive. – Jerry Coffin Sep 12 '17 at 07:35
  • Yup, that works too, since apparently there was some interesting stuff to say that nobody else had said yet :) Nice research. – Peter Cordes Sep 12 '17 at 18:25
0

You won't get the same physical address. If you have process A and B each call of malloc returns the address of a free block. The order in which A and B calls malloc is not predictable. But it never happens "in the same moment".

Paolo
  • 2,461
  • 5
  • 31
  • 45
-1

Technically, if the forked processes both request the same size of block, they should get the same address allocated, but each of those addresses will point to a different physical/real memory location.

Linux uses copy-on-write for fork, so forked children share their parent's memory, until something is changed in either process. At that point the kernel goes through the memory copying sequence to give the forked child it's own dedicated/unique copy of its memory space.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • I am not talking about real memory, only virtual memory. I know about copy-on-write and virtual memory management. – MetallicPriest Nov 17 '11 at 16:53
  • 2
    *"Technically, if the forked processes both request the same size of block, they should get the same address allocated"* - That's not true at all - the value of addresses given by `malloc` is unspecified and often the randomness is introduced by the kernel call itself, not anything in user space, so the libc implementation doesn't have to call `rand()` or anything crazy like that to make it non-deterministic. – Flexo Nov 17 '11 at 16:55
  • malloc is going to try to prevent address space fragmentation and won't allocate blocks at random. But when the kernel maps the process' virtual memory space to physical, THAT mapping can be randomized. – Marc B Nov 17 '11 at 16:56
  • @MarcB - `malloc` is possibly just a call to `sbrk` or `mmap` (since we're looking for deterministic we can't safely assume that there will have been sufficient memory left over from a `free` or previous `sbrk`/`mmap` call to handle the request), neither of those calls promise anything more than trying to fulfil the request itself. The new pointer from a call to either of those can sensibly be randomised within the available virtual address space – Flexo Nov 17 '11 at 17:19
  • @Flexo: `sbrk` is deterministic. Both parent and child with have their break at the same place, because `fork()` duplicates the parent's address space. On Linux, the underlying system call is [`void *sys_brk(void*)`](http://man7.org/linux/man-pages/man2/brk.2.html) and just sets the break, returning the new value. glibc keeps track of the old break to know what value to pass for `sbrk` increment function calls. – Peter Cordes Sep 12 '17 at 06:38
  • Sure, but there's no requirement in C for malloc to be implemented in terms of sbrk and glibc isn't the only libc in town either, so whilst in some implementations it is true that the behaviour might appear deterministic there's no particular reason to assume that's true of all implementations, or even of the next release of a given implementation. – Flexo Sep 12 '17 at 06:51