4

Background: I am looking at developing a package manager similar to portage in Gentoo Linux ( I may end up forking portage). For those that know little about Gentoo it is a source based distro, which means that all packages are compiled from source code. Currently it is possible to compile a program into object files and then into executable's.

$ gcc -c  a.c -o a.o
$ gcc -c  b.c -o b.o
$ gcc a.o b.o -o executable

The improvements I would like to make to portage are the following.

  1. Ability to only re-compile object files that have been updated (track changes using GIT or otherwise).
  2. Decompile/Unlink executable to object files.
  3. Re-compile/re-link object files replacing only the old object files with the updated object files (Changes tracked using GIT or otherwise).
  4. Then the newly compiled package replaces the old package. (trivial task)

Reasoning: I am an Arch linux user who loves the idea of a source based distribution but cannot be bothered with the enormous task of keeping my system up to date. I also do most of my work on a laptop computer with a small hard drive, hence the reason behind de-compiling/un-linking the executable to object files rather that just keeping the object files which take up a large amount of space. It would also likely decrease the overall compile time of the system as the need to re-compile most of the source code would be greatly reduced. It would also allow for an easy way to change the USE flags on a package without the need to completely re-compile.

Question: Is it possible to compile object files into an executable and then to de-compile back into object files. An example of this is below.

$ gcc -c  a.c -o a.o
$ gcc -c  b.c -o b.o
$ gcc a.o b.o -o executable

and then

$ SomeCommand executable 
output << a.o b.o

If this is not currently possible. Would it be doable to modify a version of GNU's linker "$ld" to log the changes it makes when linking object files, so as to make intentionally make the program "reverse Engineerable" ???

Edit: Another use for this would be too separate a singular object file from an executable of a large project to swap the separated object file with a new one and to re-link again. This would reduce the overhead of re-linking large projects from many different files when only one is updated. This would allow for incremental compilation on the binary level.

silvergasp
  • 1,517
  • 12
  • 23
  • Hmm, perhaps you can make a custom linker script that keeps the names of the .o files and associates it to the .text, .data etc sections. Then later you could extract and replace them with `elfsh` (http://www.eresi-project.org/wiki/TheELFsh). How exactly it could work is beyond my abilities though ^_^ – Johannes Schaub - litb Sep 03 '15 at 10:29
  • 1
    How do you expect to recover from a small executable .obj files that (as you said yourself) are pretty large? I don't think the linker shrinks the executable due to any kind of compression - it just discards the unneeded information. So if you were to make it embed things required to reverse the process, the executable size would grow equivalently, making the whole process useless. – rr- Sep 03 '15 at 10:33
  • Normally you loose information when you link. One is that you combine the sections of each object file into the same section making it into just a blob. If you did not do that I suppose it would be possible to take the executable apart again (given you know which sections belonging to which object file). – skyking Sep 03 '15 at 10:36
  • Don't forget about the `.a` files :p – Johannes Schaub - litb Sep 03 '15 at 10:48

2 Answers2

5

No, this is not possible. A large amount of the linker's work is replacing symbolic references (valid for any combination of object files being linked together) with numeric offsets (valid only for the particular way the linker decided to lay out that particular combination of object files, that particular time). Once the references are "baked" in this way, they cannot be recovered.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • GNU ld has the option `--emit-relocs` which may make it work. He then could try to undo the relocations. Not sure whether it works for all relocations though. – Johannes Schaub - litb Sep 03 '15 at 10:39
  • 1
    @JohannesSchaub-litb Good point, but with the relocation sections left in, the one halfway-reasonable motivation for this (reducing size on disk) would pretty much disappear. – Sneftel Sep 03 '15 at 10:43
  • But what if this data was stored separately in a separate file (i.e a file that stored the numerical offsets that replaced the symbolic references). Basically what I am looking for is a way to make the linker log what it does in which case the changes could simply be reversed??? or am i missing something?? – silvergasp Sep 03 '15 at 10:45
  • 1
    No, you could certainly write a linker this way. But the result wouldn't really have any significant advantage over keeping the object files around. Certainly the disk usage would be comparable. – Sneftel Sep 03 '15 at 11:12
  • 1
    @Sneftel Your comment answered my Question, if you'd be happy to put that into your answer I will accept it... Not sure Why I've been down voted for this though, it seemed like a reasonable question to ask :P Thanks for your time – silvergasp Sep 03 '15 at 11:35
2

It might be doable if you alter/configure ld to keep sections for each object file apart and also keeps the relocation table for each object file in the executable. Also you have to make sure ld stores the object file names in the executable if you want to get the original file names.

Basically a linker could just join the object files together and then do the relocations, if the relocations are inversible you should be able to reverse the process.

skyking
  • 13,817
  • 1
  • 35
  • 57