I have a C++ program that is nondeterministic. I run it with an input file, and it runs ok. I run it with the same input file a second time, and it crashes. I'd like to get rid of the nondeterminism in order to make crashes reproducible.
I threw in some print-statements to print the addresses of certain data structures. With each execution, the data structures are at different addresses (under Linux).
One obvious reason that malloc would return unpredictable addresses would be ASLR. I turned that off. I can verify that it's off - the shared libraries are always loaded at the same addresses, the stack is always at the same address, and so forth. But even with ASLR off, malloc still isn't reproducible - it returns different addresses on different successive runs.
I'm wracking my brain to find possible sources of nondeterminism:
I've run the program with 'strace'. I can 'diff' the straces from two successive executions, and there are no diffs except the print-statements that print the addresses of my data structures (which are at different addresses).
It's not using threads, to my knowledge, unless glibc or C++ is using threads behind the scenes. I do notice that ptmalloc uses __thread variables... could this be relevant?
No signal handlers other than the default ones. I'm not using signals intentionally.
Theoretically, it would be possible for something in glibc to get one of the CPU performance counters, and use that as a source of nondeterminism. But I'm skeptical that's what's happening.
Does anybody know what could be the cause of malloc returning different addresses on successive executions?
UPDATE:
Here's the smallest program I have found that exhibits nondeterminism, even with ASLR turned off:
int main(int argc, char **argv) {
// Turn off ASLR address space layout randomization.
const int old_personality = personality(ADDR_NO_RANDOMIZE);
if (!(old_personality & ADDR_NO_RANDOMIZE)) {
const int new_personality = personality(ADDR_NO_RANDOMIZE);
if (new_personality & ADDR_NO_RANDOMIZE) {
execv(argv[0], argv);
}
}
// Create a lua engine, then free it.
lua_State *L = luaL_newstate();
lua_close(L);
// Allocate a big block of RAM.
malloc(4*1024*1024);
// Now print the hash of some mallocs.
int hash = 0;
for (int i = 0; i < 100; i++) {
int n = (int)(ptrdiff_t)malloc(1);
hash = (hash * 17) + n;
}
fprintf(stderr, "%08x\n", hash);
}
Here's the output from three runs:
$ ./foo
c75ba620
$ ./foo
0e2e5210
$ ./foo
7c38ba10
I have no idea why the lua allocation is relevant, but it doesn't do it without the luaL_newstate and lua_close. It also doesn't do it without the 4-megabyte malloc in the middle.
UPDATE 2:
I found the source of nondeterminism. The lua library is calling time(0) to obtain the current time, and then it's using that as a random seed that affects what memory allocations it makes. The reason it took so long to find this is that 'strace' isn't reporting the syscall to 'time(0).' I had assumed that all system calls were reported by strace.