1

I've not found what I am looking for, so I'll formulate my own question.

Consider the C-program below

char heap[<some-static-size>];

void main() {
  <this-code-reads-and-writes-to-heap>
}

I would like to execute this program in gdb, after first having initialized heap. One posibility is that i put the contents of the array in a file, and somehow feed that to gdb. How could I do this easiest? Once main is done, I would like to read the contents of heap and write it to some other file.

Robert
  • 165
  • 10
  • Are you not putting the initial data in the source code because it changes frequently? How frequently? If you are going to read the data from a file, will it be in binary form or a form like C source code? (E.g., raw bytes or text like `0x37, 0xfe,…`?) Why not put some lines at the start of `main` to read the data? A simple loop with `scanf` could read the data. If you need to use `gdb`, you could write the file in the form of gdb commands that assign values to locations in the array. – Eric Postpischil Aug 24 '23 at 15:20
  • @EricPostpischil It will change frequently, yes. This is part of a larger project concerning compiler testing. The data for the array will be spit out by the test harness, so I can output either format. It is under my control. I would like to keep this initialization code out of the generated program itself, if possible. – Robert Aug 24 '23 at 16:07

2 Answers2

2

The GDB dump and restore commands can be used here. Set breakpoints at main and exit. Use &heap[0] as the start address.

$ cat heap.c
char heap[100];

int main() {
    heap[0]++;
}
$ dd if=/dev/random of=input.bin count=1 ibs=100
1+0 records in
0+1 records out
100 bytes copied, 0.000306025 s, 327 kB/s
$ gdb -q heap
(gdb) b main
(gdb) commands
>restore input.bin binary &heap[0]
>continue
>end
(gdb) b exit
(gdb) commands
>dump binary memory output.bin &heap[0] &heap[100]
>continue
>end
(gdb) run
Breakpoint 1, main () at heap.c:4
4       heap[0]++;
Restoring binary file input.bin into memory (0x555555558040 to 0x5555555580a4)
Breakpoint 2, __GI__exit (status=0) at ../sysdeps/unix/sysv/linux/_exit.c:140
[Inferior 1 (process 9802) exited normally]
(gdb) q
$ cmp -l input.bin output.bin
  1  52  53
Mark Plotnick
  • 9,598
  • 1
  • 24
  • 40
  • This is precisely what I want to do. It works, apart from me gdb giving me an error when I set the breakpoint `_Exit`. It says "function _Exit not defined". I am sure that I can find some other workaround for finding the end of my function though. – Robert Aug 25 '23 at 08:00
  • I misremembered the C standard. Returning from `main` is equivalent to calling `exit`, not `_Exit`. I've updated my answer. – Mark Plotnick Aug 25 '23 at 10:46
  • I tried for a while to search for solutions to this, but no form of `_Exit` or `exit` works for me, I get the same error. My `main` always appear at the end, and I can infer a suitable breakpoint by inspecting the number of lines in the file, so for my purposes this is not a problem. It would, however, be a better solution if I could manage to set the proper breakpoint. – Robert Aug 26 '23 at 09:36
  • Would it be posible to add a call to `exit(0);` at the end of your `main` and at any point in `main` that does a `return`? – Mark Plotnick Aug 26 '23 at 23:15
  • main never returns prematurely, but I can absolutely add a call to `exit(0)` at the end :) That should solve it. Thanks. – Robert Aug 27 '23 at 14:31
-2

You can create a shared library that does:

  1. In a constructor, based on environment variable HEAPFILE, does mmap so heap is R/W mapped to HEAPFILE (Note: trickery involved, see code below)
  2. Target program will have heap setup [magically ;-)].
  3. When target program completes, shared library destructor will copy contents of [possibly modified] heap to the file HEAPOUT

Of course, these filenames could be program arguments and the target program could call the constructor and destructor functions explictly but you wanted the target program to be [largely] unaware of what data it's been given.

You can have many different target programs (i.e. programs that you want to "inject" data into). They only need to be compiled and linked once but many different data files can be used.

So, you can easily do: Test N programs with M data inputs.


Here is the shared library code. Note that although it has some error checking, and it works, it could be cleaned up a bit.

// libtst.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/mman.h>

#if DEBUG
#define dbgprt(_fmt...) \
    printf(_fmt)
#else
#define dbgprt(_fmt...) \
    do { } while (0)
#endif

// exported to target program
char *heap;                             // pointer to heap
size_t heapsize;                        // bytes in heap

// private
static char *heap_file;                 // input file
static char *heap_ofile;                // output file
static int heap_fd;                     // open file descriptor for heap_file
static struct stat heap_st;             // result of stat for heap_file

void __attribute__((constructor))
fakeinit(void)
{

    do {
        // get input file name
        heap_file = getenv("HEAPFILE");
        if (heap_file == NULL)
            break;

        // get output file name
        heap_ofile = getenv("HEAPOUT");
        if (heap_ofile == NULL)
            break;

        // open input file
        heap_fd = open(heap_file,O_RDONLY);
        if (heap_fd < 0)
            break;

        // get input file size (and export this for target program)
        if (fstat(heap_fd,&heap_st) < 0)
            break;
        heapsize = heap_st.st_size;

        // map this for the target program
        // NOTE: by using PROT_READ/PROT_WRITE target program will think this
        // is an ordinary array, but MAP_PRIVATE will prevent original file
        // from being altered
        heap = mmap(NULL,heapsize,
            PROT_READ | PROT_WRITE,
            MAP_PRIVATE,
            heap_fd,0);

        if (heap == MAP_FAILED) {
            dbgprt("fakeinit: FAILED\n");
            heap = NULL;
            break;
        }
    } while (0);
}

void __attribute__((destructor))
fakefini(void)
{

    do {
        // open the output file
        int fdout = open(heap_ofile,O_RDWR | O_CREAT,0644);
        if (fdout < 0)
            break;

        // enlarge output file to correct size
        dbgprt("fakefini: heapsize=%zu\n",heapsize);
        ftruncate(fdout,heapsize);

        // map the output file R/W
        char *outbuf = mmap(NULL,heapsize,
            PROT_READ | PROT_WRITE,
            MAP_SHARED,
            fdout,0);
        if (outbuf == MAP_FAILED) {
            dbgprt("fakefini: FAILED\n");
            break;
        }

        // put data into output file
        memcpy(outbuf,heap,heapsize);

        // unmap/close output file
        munmap(outbuf,heapsize);
        close(fdout);

        // unmap/close input file
        munmap(heap,heapsize);
        close(heap_fd);
    } while (0);
}

Here is the sample program to run under test:

// test.c -- sample program to be run under test
#include <stddef.h>

// set by shared library
extern char *heap;
extern size_t heapsize;

int
main(void)
{

    heap[0] += 1;
    heap[1] += 1;
    heap[2] += 1;

    return 0;
}

Here is a build/run script:

#!/bin/bash

cc -o libtst.so -shared -fpic libtst.c -g -DDEBUG=1
cc -o test test.c ./libtst.so -g

echo "abc" > old
rm -f new

env HEAPFILE=old HEAPOUT=new ./test

head -1000 old new

Here is the output of sh -x ./build:

+ cc -o libtst.so -shared -fpic libtst.c -g -DDEBUG=1
+ cc -o test test.c ./libtst.so -g
+ echo abc
+ rm -f new
+ env HEAPFILE=old HEAPOUT=new ./test
fakefini: heapsize=4
+ head -1000 old new
==> old <==
abc

==> new <==
bcd

Note that above links the program against the shared test library.

This is the easiest. But, it occurs to me, that you'd like the target program to be totally unaware of how heap is initialized.

Although I didn't try it, it should be possible [with some slight modifications (e.g.) using ELF weak symbols/aliases] to change libtst.so so that it force itself on the target programs using:

env LD_PRELOAD=./libtst.so ./test
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • This is a nice hack, but you are correct in that I don't want my target program to make any assumptions of how heap is initialized. I am sure your answer is helpful to someone without this requirement, however. Thank you! – Robert Aug 25 '23 at 08:01