Checkpointing with LD_PRELOAD -- how to manipulate the instruction pointer and call stack?

Question

The LD_PRELOAD technique allows us to supply our own custom standard library functions to an existing binary, overriding the standard ones or manipulating their behaviour, giving a fun way to experiment with a binary and understand its behaviour.

I've read that LD_PRELOAD can be used to "checkpoint" a program --- that is, to produce a record of the full memory state, call stack and instruction pointer at any given time --- allowing us to "reset" the program back to that previous state at will.

It's clear to me how we can record the state of the heap. Since we can provide our own version of malloc and related functions, our preloaded library can obviously gain perfect knowledge of the memory state.

What I can't work out is how our preloaded functions can determine the call stack and instruction pointer; and then reset them at a later time to the previously recorded value. Clearly this is necessary for checkpointing. Are there standard library functions that can do this? Or is a different technique required?

Employed Russian · Answer 1 · 2022-01-25T05:11:12.187

I've read that LD_PRELOAD can be used to "checkpoint" a program ... allowing us to "reset" the program back to that previous state at will.

That is a gross simplification. This "checkpoint" mechanism can not possibly restore any open file descriptors, or any mutexes, since the state of these is partially inside the kernel.

It's clear to me how we can record the memory state. ...
What I can't work out is how our preloaded functions can determine the call stack and instruction pointer;

The instruction pointer is inside the preloaded function, and is trivially available as e.g. register void *rip __asm__("rip") on x86_64. But you (likely) don't care about that address -- you probably care about the caller of your function. That is also trivially available as __builtin_return_address() (at least when using GCC).

And the rest of the call stack is saved in memory (in the stack region to be more precise), so if you know the contents of memory, you know the call stack.

Indeed, when you use e.g. GDB where command with a core dump, that's exactly what GDB does -- it reads contents of memory from the core and recovers the call stack from it.

Update:

I wrote in my original post that I know how to inspect the memory, but in fact I only know how to inspect the heap. How can I view the full contents of all stack frames?

Inspecting memory works the same regardless of whether that memory "belongs" to heap, stack, or code. You simply dereference a pointer and voilà -- you get the contents of memory at that location.

What you probably mean is:

how to find location of stack and
how to decode it

The answer to the first question is OS-specific, and you didn't tag your question with any OS.

Assuming you are on Linux, one way to locate the stack is to parse entries in /proc/self/maps looking for an entry (continuous address range) which "covers" current stack (i.e. "covers" an address of any local variable).

For the second question, the answer is:

it's complicated¹ and
you don't actually need to decode it in order to save/restore its state.

¹To figure out how to decode stack, you could look at sources for debuggers (such as GDB and LLDB).

This is also very OS and processor specific.

You would need to know calling conventions. On x86_64 you would need to know about unwind descriptors. To find local variables, you would need to know about DWARF debugging format.

Did I mention it's complicated?

Thanks Employed Russian. I wrote in my original post that I know how to inspect the memory, but in fact I only know how to inspect the heap. How can I view the full contents of all stack frames? — atomickitten, Jan 24 '22 at 16:24

Checkpointing with LD_PRELOAD -- how to manipulate the instruction pointer and call stack?

1 Answers1