0

As a simple example of my problem, let's say we have two data arrays to embed into an executable to be used in a C program: chars and shorts. These data arrays are stored on disk as chars.raw and shorts.raw.

Using objcopy I can create object files that contain the data.

objcopy --input binary --output elf64-x86-64 chars.raw char_data.o
objcopy --input binary --output elf64-x86-64 shorts.raw short_data.o

objdump shows that the data is correctly stored and exported as _binary_chars_raw_start, end, and size.

$ objdump -x char_data.o 

char_data.o:     file format elf64-x86-64
char_data.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         0000000e  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 g       .data  0000000000000000 _binary_chars_raw_start
000000000000000e g       .data  0000000000000000 _binary_chars_raw_end
000000000000000e g       *ABS*  0000000000000000 _binary_chars_raw_size

(Similar output for short_data.o)

However, when I link these object files with my code into an executable, I run into problems. For example:

#include <stdio.h>

extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;

extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;

int main(int argc, char **argv) {
        printf("%ld == %ld\n", _binary_chars_raw_end - _binary_chars_raw_start, _binary_chars_raw_size / sizeof(char));
        printf("%ld == %ld\n", _binary_shorts_raw_end - _binary_shorts_raw_start, _binary_shorts_raw_size / sizeof(short));
}

(compiled with gcc main.c char_data.o short_data.o -o main) prints

14 == 196608
7 == 98304

on my computer. The size _binary_chars_raw_size (and short) is not correct and I don't know why.

Similarly, if the _starts or _ends are used to initialize anything, then they may not even be located near each other in the executable (_end - _start is not equal to the size, and may even be negative).

What am I doing wrong?

pizzapants184
  • 172
  • 3
  • 8

1 Answers1

1

The lines:

extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;

extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;

They are not variables themselves. They are variables that are placed themselves at the beginning and end of the region. So the addresses of these variables are the start and end of the region. Do:

#include <stdio.h>

extern char _binary_chars_raw_start;
extern char _binary_chars_raw_end;
extern char _binary_chars_raw_size;

    // print ptrdiff_t with %td
    printf("%td == %d\n", 
          // the __difference in addresses__ of these variables
           &_binary_chars_raw_end - &_binary_chars_raw_start,
           (int)&_binary_chars_raw_size);
    // note: alsoo print size_t like result of `sizeof(..)` with %zu

@edit _size is also a pointer

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • This is correct in that `_size` is a pointer that is meant to be read directly as an integer, but it seems that it's value is modified by the dynamic linker (?) unless I compile with `-static`. The value given by `objdump` is correct, but when the executable is run, the value is incorrect (it is off by the prefix that the `_start` and `_end` are relocated to, e.g. when `_start` is 0x555dcad710010 and `_end` is 0x555dcad75de8, `_size` should be 0x5dd8, but it is 0x555dcad71dd8 unless I compile with `-static`). I read that this may be a bug? https://sourceware.org/bugzilla/show_bug.cgi?id=19818 – pizzapants184 May 21 '20 at 17:54
  • (cont'd) I have another problem with some symbols being rearranged in my program (last paragraph of my question) , but I haven't found a minimal example with the same problem yet, so I don't really know how to explain what is wrong in that case. – pizzapants184 May 21 '20 at 17:55
  • [ASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization). Maybe size isn't a pointer? I didn't actually check it, let me grab my compiler. – KamilCuk May 21 '20 at 18:07
  • 1
    I found my issue and I think it's a bug in objcopy. I was using `objcopy --input binary --output elf64-x86-64 -i 4 --interleave-width=2 --byte 0` to only copy the first channel of a 16-bit raw audio file, so objcopy only copied half the file, BUT `_end` and `_size` still had the values as if the full file were copied, so I guess that caused the linker to move the symbols relative to each other since it didn't think they were supposed to be next to each other (since they weren't). I'm not too sure about any of this, but removing the `-i 4 --interleave-width=2 --byte 0` fixed it. – pizzapants184 May 21 '20 at 18:50