You can definitely do this, but the exact procedure is going to be different for every platform, with a few commonalities. Your tool of choice here is probably going to be llvm_objcopy.
At a high level, you will create a special segment or section in the binary (or both as in the case of MachO) containing the data you want, and then you'll have to parse your own executable to retrieve it. Since you said you're on a Mac, we can start there as an example.
Create the dumbest possible test program as a starting point:
test.c
#include <stdio.h>
int main(int argc, char **argv)
{
printf("I'm a binary!\n");
return 0;
}
Compile and run it:
prompt$ clang -o test test.c
prompt$ ./test
I'm a binary
Now create a text file hello.txt
and put some text in it:
Hello world!
You can embed this into the MachO file with llvm-objcopy
llvm-objcopy --add-section __MAGIC,__magic_section=hello.txt test test
Then check that it still runs:
prompt$ ./test
I'm a binary!
You can verify that the section has been added using otool -l
, you can also run strings
on the binary, and you should see your text:
prompt$ strings ./test
I'm a binary!
Hello world!
Now you have to be able to retrieve this data. If you were compiling everything in a priori, it would be easy since the linker would make symbols for you marking the start and end of the __magic_section
section that you added.
Since you specifically said this has to be an a posteriori step, you're going to have to actually self-parse the MachO file to look up the __magic_section
section in the __MAGIC
segment that you added. You can find a few different references for parsing MachO files, but you probably also want to make use of built in dyld
functionality. See Apple's Reference on dyld
utility calls that can for example give you the Mach header for the running process. Linux has similar functionality by way of dl_iterate_phdr
.
Once you know where the section is in your original binary, you can retrieve the text.
To repeat all of this on Linux, you will do pretty much the same thing, but you'll be working with the ELF file format instead of MachO. The same principles would apply though.
As a side note: this is exactly how code signing works on MacOS. A signature is generated and placed into a dedicated "signature" section in the binary to be read by the system on launch.