1

I have a rather weird question but I don't really know how to put it or where to start looking from.

My question is not about "embedding" a text file (we already have at compile time) - that is too obvious.

My question is if (and how) I could let's say "package" an existing (created by C) binary with a text file and generate a new... working binary with access to that file.

I'm a Mac user. I know that could work with an .app package and all that. But that's still not what I want. I want to be able to "tweak" an existing binary, add some (accessible - how?) additional text data to it, and the binary remaining absolutely functional.

Is that even possible?

P.S. The only serious tool I've looked into is bsdiff and bspatch but I'm not really sure it's what I'm looking for.

Dr.Kameleon
  • 22,532
  • 20
  • 115
  • 223
  • 1
    Can you clarify what you mean by "accessible?" Like are you expecting a filesystem explorer like Finder to recognize it as a text file? Or you just mean you could run a separate tool to extract the text file later and you don't care what that tool is? Do you care about cryptographic signing of the binary? – Jon Reeves May 13 '21 at 16:04
  • @JonReeves I mean I have to somehow be able to access this a posteriori added information/text from within the compiled binary. – Dr.Kameleon May 13 '21 at 21:01
  • Got it, thanks for the clarification. The answer to your question is definitely "yes", but it's complicated, and platform dependent. I can provide a barebones answer but it will just give you a start. – Jon Reeves May 13 '21 at 21:24

1 Answers1

1

You can definitely do this, but the exact procedure is going to be different for every platform, with a few commonalities. Your tool of choice here is probably going to be llvm_objcopy.

At a high level, you will create a special segment or section in the binary (or both as in the case of MachO) containing the data you want, and then you'll have to parse your own executable to retrieve it. Since you said you're on a Mac, we can start there as an example.

Create the dumbest possible test program as a starting point:

test.c

#include <stdio.h>

int main(int argc, char **argv)
{
    printf("I'm a binary!\n");
    return 0;
}

Compile and run it:

prompt$ clang -o test test.c
prompt$ ./test
I'm a binary

Now create a text file hello.txt and put some text in it:

Hello world!

You can embed this into the MachO file with llvm-objcopy

llvm-objcopy --add-section __MAGIC,__magic_section=hello.txt test test

Then check that it still runs:

prompt$ ./test
I'm a binary!

You can verify that the section has been added using otool -l, you can also run strings on the binary, and you should see your text:

prompt$ strings ./test
I'm a binary!
Hello world!

Now you have to be able to retrieve this data. If you were compiling everything in a priori, it would be easy since the linker would make symbols for you marking the start and end of the __magic_section section that you added.

Since you specifically said this has to be an a posteriori step, you're going to have to actually self-parse the MachO file to look up the __magic_section section in the __MAGIC segment that you added. You can find a few different references for parsing MachO files, but you probably also want to make use of built in dyld functionality. See Apple's Reference on dyld utility calls that can for example give you the Mach header for the running process. Linux has similar functionality by way of dl_iterate_phdr.

Once you know where the section is in your original binary, you can retrieve the text.

To repeat all of this on Linux, you will do pretty much the same thing, but you'll be working with the ELF file format instead of MachO. The same principles would apply though.

As a side note: this is exactly how code signing works on MacOS. A signature is generated and placed into a dedicated "signature" section in the binary to be read by the system on launch.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Jon Reeves
  • 2,426
  • 3
  • 14