0

I have a C++ program that is nondeterministic. I run it with an input file, and it runs ok. I run it with the same input file a second time, and it crashes. I'd like to get rid of the nondeterminism in order to make crashes reproducible.

I threw in some print-statements to print the addresses of certain data structures. With each execution, the data structures are at different addresses (under Linux).

One obvious reason that malloc would return unpredictable addresses would be ASLR. I turned that off. I can verify that it's off - the shared libraries are always loaded at the same addresses, the stack is always at the same address, and so forth. But even with ASLR off, malloc still isn't reproducible - it returns different addresses on different successive runs.

I'm wracking my brain to find possible sources of nondeterminism:

  • I've run the program with 'strace'. I can 'diff' the straces from two successive executions, and there are no diffs except the print-statements that print the addresses of my data structures (which are at different addresses).

  • It's not using threads, to my knowledge, unless glibc or C++ is using threads behind the scenes. I do notice that ptmalloc uses __thread variables... could this be relevant?

  • No signal handlers other than the default ones. I'm not using signals intentionally.

  • Theoretically, it would be possible for something in glibc to get one of the CPU performance counters, and use that as a source of nondeterminism. But I'm skeptical that's what's happening.

Does anybody know what could be the cause of malloc returning different addresses on successive executions?

UPDATE:

Here's the smallest program I have found that exhibits nondeterminism, even with ASLR turned off:


int main(int argc, char **argv) {
    // Turn off ASLR address space layout randomization.
    const int old_personality = personality(ADDR_NO_RANDOMIZE);
    if (!(old_personality & ADDR_NO_RANDOMIZE)) {
       const int new_personality = personality(ADDR_NO_RANDOMIZE);
       if (new_personality & ADDR_NO_RANDOMIZE) {
           execv(argv[0], argv);
       }
    }
    
    // Create a lua engine, then free it.
    lua_State *L = luaL_newstate();
    lua_close(L);

    // Allocate a big block of RAM.
    malloc(4*1024*1024);
    
    // Now print the hash of some mallocs.
    int hash = 0;
    for (int i = 0; i < 100; i++) {
        int n = (int)(ptrdiff_t)malloc(1);
        hash = (hash * 17) + n;
    }
    fprintf(stderr, "%08x\n", hash);
}

Here's the output from three runs:

$ ./foo
c75ba620
$ ./foo
0e2e5210
$ ./foo
7c38ba10

I have no idea why the lua allocation is relevant, but it doesn't do it without the luaL_newstate and lua_close. It also doesn't do it without the 4-megabyte malloc in the middle.

UPDATE 2:

I found the source of nondeterminism. The lua library is calling time(0) to obtain the current time, and then it's using that as a random seed that affects what memory allocations it makes. The reason it took so long to find this is that 'strace' isn't reporting the syscall to 'time(0).' I had assumed that all system calls were reported by strace.

jyelon
  • 134
  • 7
  • If you remove nondeterminism you might just make it not crash ever. I don’t think it’ll help you find out what’s wrong, a debugger and making sure there’s zero warning with max whiny settings on the computer should help to find the actual issue – Sami Kuhmonen Feb 18 '22 at 06:28
  • To able to answer your question, we need your code. – K.R.Park Feb 18 '22 at 06:34
  • However, I believe that well-formed C++ programs should not rely on the actual address your malloc returns, except for the hardware control and driver program, I think you should check the validity of your program. – K.R.Park Feb 18 '22 at 06:36
  • This is a good usecase for a debugger. Go through your code line by line and inspect where something undesired happens by inspecting values of variables. – Raildex Feb 18 '22 at 06:37
  • 1
    Try using valgrind or address sanitizer. – n. m. could be an AI Feb 18 '22 at 06:38
  • Linux provides new processes with some entropy via the `AT_RANDOM` field of the process auxiliary vector. I don't know whether glibc uses it to randomize `malloc` layout, but if I remember correctly at least musl does. Then of course glibc could also be querying entropy itself, e.g. from `/dev/random`, `/dev/urandom` or a CPU instruction. – user17732522 Feb 18 '22 at 06:38
  • 1) Does gdb list more than one thread? Just to backup the "No threads to my knowledge" assumption. 2) Does a simple test program with a single (or several successive) `malloc`s in main() produce the same addresses each time you run it? – Igor G Feb 18 '22 at 06:45
  • igor: gdb only shows one thread. Good idea. And a simple test program *does* produce the same addresses every time I run it. I only get nondeterminism when I have a larger, more complex program. I've been unable to reproduce the effect in a small program. – jyelon Feb 18 '22 at 06:52
  • There could be a million reasons for this, you really need to provide some example code that exhibits the behaviour. – Galik Feb 18 '22 at 07:36

3 Answers3

0

Does anybody know what could be the cause of malloc returning different addresses on successive executions?

Assuming the process is actually single-threaded and no block address randomization is done by malloc, I guess:

Your program may have allocated random amount of memory at some point, for example, by using unreliable garbage value when calculating the size of some block to allocate. Addresses returned by all subsequent allocations might be affected (and thus randomized) by that.

Of course, that will depend on malloc implementation and actual size of allocation: if it is implemented as a per-thread bucket allocator, and the random block size doesn't exceed the bucket size, then the impact on subsequent allocations would be next to none.

Igor G
  • 1,838
  • 6
  • 18
  • It is true that a call to malloc with a random parameter would affect all future allocations. But my program isn't using any "true" randomness (it isn't reading /dev/random). So if it makes a "random" call to malloc the first time you run it, it should make the *exact same* "random" call to malloc the second time. – jyelon Feb 18 '22 at 07:04
  • Randomness doesn't have to be "true" :- ) A poor unintended randomness would do just as well. For example, it may come from reading an uninitialized variable or from other undefined behavior. See, I'm not insisting. I'm just guessing what could have caused the behavior you see. Sorry if that didn't help. – Igor G Feb 18 '22 at 07:12
  • @IgorG Reading uninitialized memory should not introduce entropy since I am pretty sure linux zeros all pages before they are handed to a process, except if specific config flags for embedded devices are set. – user17732522 Feb 18 '22 at 07:27
0

In this case, the nondeterminism was coming from inside the Lua runtime. Lua is using 'time(0)' to generate a random seed, which then affects what malloc calls are made by Lua.

The reason that this source of nondeterminism was hidden from me is that linux 'strace' doesn't report the system call to "time(0)". I had assumed that strace would show me any system calls that returned different values on successive executions.

jyelon
  • 134
  • 7
  • `time` is probably implemented via [vdso](https://man7.org/linux/man-pages/man7/vdso.7.html) to avoid the syscall overhead. `strace` can't detect that. It may be helpful to also look at `ltrace` output in such a situation. – user17732522 Feb 18 '22 at 09:41
-3

You are propably victim of memory leak problem. Your program use memory which is not allocated by you. If this memory address is in the same memory block in which you have something allocated program does not crash. But if program accesses memory in block you have not get right to read/write, operation system kills your program - you see this as a crash.

ufok
  • 193
  • 4
  • 1
    First, this is not what a memory leak is and second, the question is not why the crash happens, but how to make it deterministic. – user17732522 Feb 18 '22 at 06:43