9

I've been delving deeper into Linux and C, and I'm curious how functions are stored in memory. I have the following function:

void test(){
    printf( "test\n" );
}

Simple enough. When I run objdump on the executable that has this function, I get the following:

08048464 <test>:
 8048464:       55                      push   %ebp
 8048465:       89 e5                   mov    %esp,%ebp
 8048467:       83 ec 18                sub    $0x18,%esp
 804846a:       b8 20 86 04 08          mov    $0x8048620,%eax
 804846f:       89 04 24                mov    %eax,(%esp)
 8048472:       e8 11 ff ff ff          call   8048388 <printf@plt>
 8048477:       c9                      leave
 8048478:       c3                      ret

Which all looks right. The interesting part is when I run the following piece of code:

int main( void ) {
    char data[20];
    int i;    
    memset( data, 0, sizeof( data ) );
    memcpy( data, test, 20 * sizeof( char ) );
    for( i = 0; i < 20; ++i ) {
        printf( "%x\n", data[i] );
    }
    return 0;
}

I get the following (which is incorrect):

55
ffffff89
ffffffe5
ffffff83
ffffffec
18
ffffffc7
4
24
10
ffffff86
4
8
ffffffe8
22
ffffffff
ffffffff
ffffffff
ffffffc9
ffffffc3

If I opt to leave out the memset( data, 0, sizeof( data ) ); line, then the right-most byte is correct, but some of them still have the leading 1s.

Does anyone have any explanation for why

  1. using memset to clear my array results in an incorrect (or inaccurate) representation of the function, and

  2. what is this byte stored as in memory? ints? char? I don't quite understand what's going on here. (clarification: what type of pointer would I use to traverse such data in memory?)

My immediate thought is that this is a result of x86 having an instructions that don't end on a byte or half-byte boundary. But that doesn't make a whole lot of sense, and shouldn't cause any problems.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Neil
  • 466
  • 1
  • 6
  • 15

5 Answers5

6

I believe your chars are being sign-extended to the width of an integer. You might get results closer to what you want by explicitly casting the value when you print it.

Will
  • 4,585
  • 1
  • 26
  • 48
  • I don't believe that this is the case do to the occasional values that do no exhibit the same behavior (i.e., the 55, 4, 18, ect). If they were all sign-extended, then I would believe that would be the solution. – Neil Dec 31 '12 at 20:43
  • 4
    Those values have a high bit of zero. Extending a zero bit is sort of invisible. The problem ones have a high bit of one. – Lee Meador Dec 31 '12 at 20:46
  • I believe that you are looking at the hex of the sign extended data. If the value is `0x00000055` then `printf` puts *55*. If it's `0xFFFFFF89` then it prints the full value. If you want it to ensure that all leading 0s are printed use `"%0x"`. – Will Dec 31 '12 at 20:47
  • Wow, what a silly oversight. I forgot the unsigned keyword -_-' – Neil Dec 31 '12 at 20:50
4

Here is a much simpler case of the code you tried to do:

int main( void ) {
    unsigned char *data = (unsigned char *)test;
    int i;    
    for( i = 0; i < 20; ++i ) {
        printf( "%02x\n", data[i] );
    }
    return 0;
}

The changes I made is to remove your superfluous buffer, instead using a pointer to test, use unsigned char instead of char, and change the printf to use %02x, so that it always prints two characters (it wouldn't fix the 'negative' numbers coming out as ffffff89 or so - that's fixed with the unsigned on the data pointer).

All instructions in x86 end on byte boundaries, and the compiler will often insert extra "padding-instructions" to make sure branch-targets are aligned to 4, 8 or 16-byte boundaries for efficiency.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
1

Answer to 2.: byte is stored as byte in the memory. A memory location with exactly 1 byte contained in the memory location (a byte is unsigned char).

Hint: Pick up a good book on Computer Organization(my favorite is one by Carl Hamachar and understand a good deal about how memory is internally represented)

In your code:

memset( data, 0, sizeof( data ) );// must be memset(data,0,20);
memcpy( data, test, 20 * sizeof( char ) ); 
for( i = 0; i < 20; ++i ) {
    printf( "%x\n", data[i] );// prints a CHARACTER up-casted to an INTEGER in HEX representation, hence the extra `0xFFFFFF`
}
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Aniket Inge
  • 25,375
  • 5
  • 50
  • 78
  • a) an optimization in memset should not cause a call to memcpy to make a non-exact copy of the data. b) how would that be accessed from c? the closest thing to a byte type is unsigned char – Neil Dec 31 '12 at 20:48
1

The problem is in your code to print.

One byte is loaded from the data array. (one byte == one char)

The byte is converted to an 'int' since that's what the compiler knows 'printf' wants. To do so it sign extends the byte to a 32 bit double-word. That's what gets printed out as hex. (This means a byte with the high bit of one will get converted to a 32 bit value with bits 8-31 all set. That's the ffffffxx values you see.)

What I do in this case is to convert it myself:

 printf( "%x\n", ((int)data[i] && 0xFF) );

Then it will print correctly. (If you were loading 16 bit values you'd AND with 0xffff.)

Lee Meador
  • 12,829
  • 2
  • 36
  • 42
0

The printing looks odd because you're printing signed values, so they're being sign extended.

However the function being printed is also slightly different. It looks like instead of loading up EAX with the address of the string, and stuffing it onto the stack, it's just directly stored the address.

push        ebp  
mov         ebp,esp  
sub         esp,18h  
mov         dword ptr [esp],8048610h  
call        <printf>  
leave  
ret  

As to why it changes when you make seemingly benign changes elsewhere in the code - well, it's allowed to. That's why it's good not to rely on undefined behaviour.

JasonD
  • 16,464
  • 2
  • 29
  • 44
  • 1
    The thing it is loading into eax and then putting in the reserved space on the stack is the address of the string 'test\n' (0x8048620) – Lee Meador Dec 31 '12 at 20:50