5

I was writing some assembly code for some project of mine and I saw something interesting. the size of binary when linked is so big. so I tested and tested and even with smallest possible lines of code, output Elf binary is so large. for example:

.section .text
.global _start
_start:
    movl $1,%eax
    movl $0,%ebx
    int $0x80

after assembling and linking above code the result binary is more than 4kb! the funny thing is, most of the binary is filled with zeroes.
I tried so many things to find out what is the cause to no success.
can someone please explain to me what is the problem here?

I simply assemble and link the file:

as -o <OBJ_NAME> <SOURCE NAME>
ld -o <ELF_NAME> <OBJ_NAME>

recommending any form of resource for further reading will be nice.

as you may guessed, I use 64bit GNU/Linux

thanks.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Amir H
  • 482
  • 4
  • 10
  • Between "too broad" and off-topic for requesting off-site resources, I choose "too broad". But try googling "elf size linux". – John Bollinger Jul 09 '19 at 17:04
  • @JohnBollinger I find the question quite specific. There are clear instructions on how to reproduce and the unexpected result the OP does not understand (about 4 KB of zeroes in the executable). Not sure what off-site resources you are referring to either. – Vladislav Ivanishin Jul 09 '19 at 17:10
  • @VladislavIvanishin, "recommending any form of resource for further reading will be nice." As for breadth, yes, the question presents a specific example, but that does not in itself make the question a specific one, especially if you take the question to be the one in the title. – John Bollinger Jul 09 '19 at 17:18
  • @JohnBollinger Oh, I missed the part about recommendation when I read the qustion; my bad. I agree that the title could use some improvement, it does sound very broad. – Vladislav Ivanishin Jul 09 '19 at 17:32
  • 1
    If you're making x86-64 code, I'd suggest using 64-bit `syscall` to make an exit system call, not the 32-bit `int 0x80` ABI. – Peter Cordes Jul 10 '19 at 01:18

1 Answers1

5

This has to do with alignment. See readelf -eW <ELF_NAME>. The interesting bit is

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000401000 001000 00000c 00  AX  0   0  1

Note the Off column. This is the offset in the file, and the .text section starts with 0x1000, which is 4K.

Same picture if you look at the program headers. The space that is filled with zeroes is between the end of the ELF header and 0x1000.

Why is this?

First, because the ELF standard dictates that

Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.

(see man elf). The page size on your system (mine as well) is 4K. This is the value that you see in p_align.

Second, the virtual address the linker has assigned to the start of the "text" segment — same as for the .text section here, because that's all that segment contains here — is 0x0000000000401000. Therefore the hexadecimal representation of the "text" segment's offset in the file has to end with 000. But 0 is already taken by the readonly segment containing the ELF header (the very beginning of the file). The second choice is 0x1000.

Why did the linker choose 0x401000 as the virtual address for the text section? I don't know. I think, if you tweak the linker script a little, you'll be able to have a smaller resluting executable.


As Peter and that other guy have pointed out, page-size alignment can be disabled using the -n linker option:

'-n'
'--nmagic'
    Turn off page alignment of sections, and disable linking against
    shared libraries[…]

That way I get

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 1] .text             PROGBITS        0000000000400078 000078 00000c 00  AX  0   0  1

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000000400078 0x0000000000400078 0x00000c 0x00000c R E 0x1

and the size of the executable is down to 664 bytes (344 after stripping).


With GNU ld, you can use linker scripts to fine-control the layout of linker output files. ld.bfd (usually also known as just ld) interprets a default linker script if the user doesn't specify one. It can be obtained with ld --verbose. You can then edit it and supply your version instead of the default with -T <your-script>.

I edited out the first occurance of

. = ALIGN(CONSTANT (MAXPAGESIZE));

(before .text) and got 720 (400 when stripped) bytes. This is different from the result of using the -n option. You still get 2 loadable segmemts, and their p_align is still 0x1000.

There are efficiency implications for having p_align < MAX_PAGE_SIZE that I don't fully understand. (Pages won't be loaded as fast due to harder address computation? I think there should be a better explanation.) Feel free to edit the answer, if you know more about this or where it's explained.

Vladislav Ivanishin
  • 2,092
  • 16
  • 22
  • 2
    I think you want `ld --nmagic` to turn off page-alignment of sections, like the other answer is doing accidentally. – Peter Cordes Jul 10 '19 at 01:19
  • +1 Since this is the correct analysis while I was way off, feel free to copy the useful parts from my answer and let's see if OP or a mod will flip the accepted answer. – that other guy Jul 10 '19 at 01:37
  • 1
    Thanks! I incorporated the `-n` approach into my answer and also described another one using a custom linker script. – Vladislav Ivanishin Jul 10 '19 at 10:18