0

I'm practicing reverse engineering C object files. Suppose I have an object file of the C program:

#include <stdio.h>
#include <string.h>

int main (int argc, char ** argv) {
  char * input = argv[1];
  int result = strcmp(input, "text_to_compare");
  
  if (result == 0) {
      printf("%s\n", "text matches");
  }
  else {
      printf("%s\n", "text doeesn't match");
  }
  
  return 0;
}

How would I go about finding "text_to_compare" from the object file given it was compiled with a -g flag and an x86-64 architecture?

Said Hamed
  • 15
  • 5
  • 2
    This might be "too broad". But anyway, you would find the `main` function first, then the `call strcmp` in it then work backwards to see how the argument was set and that's where you will find your string. That's assuming you are actually looking for the "second argument to `strcmp`" and not just the string constant "text_to_compare" because that can simply be searched for. – Jester Oct 17 '22 at 21:30
  • As already said, search for how the compiler accesses the string literal; it will usualy be a label or section name, maybe plus some offset. That will be where the string is stored. – Erik Eidt Oct 17 '22 at 21:34
  • Just read the disassembly. – Margaret Bloom Oct 17 '22 at 22:06

1 Answers1

4

Running strings on a binary file will all sequences of four or more printable characters in the file. For a simple file this might be sufficient, but for a larger file you can end up with a lot of false positives. For example, compiling your code with gcc and running strings on the resulting binary will return 295 results.

We can start by using the objdump command to disassemble the code in your sample file:

$ objdump --disassemble=main a.out

a.out:     file format elf64-x86-64


Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000401136 <main>:
  401136:       55                      push   %rbp
  401137:       48 89 e5                mov    %rsp,%rbp
  40113a:       48 83 ec 20             sub    $0x20,%rsp
  40113e:       89 7d ec                mov    %edi,-0x14(%rbp)
  401141:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  401145:       48 8b 45 e0             mov    -0x20(%rbp),%rax
  401149:       48 8b 40 08             mov    0x8(%rax),%rax
  40114d:       48 89 45 f8             mov    %rax,-0x8(%rbp)
  401151:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  401155:       be 10 20 40 00          mov    $0x402010,%esi
  40115a:       48 89 c7                mov    %rax,%rdi
  40115d:       e8 de fe ff ff          call   401040 <strcmp@plt>
  401162:       89 45 f4                mov    %eax,-0xc(%rbp)
  401165:       83 7d f4 00             cmpl   $0x0,-0xc(%rbp)
  401169:       75 0c                   jne    401177 <main+0x41>
  40116b:       bf 20 20 40 00          mov    $0x402020,%edi
  401170:       e8 bb fe ff ff          call   401030 <puts@plt>
  401175:       eb 0a                   jmp    401181 <main+0x4b>
  401177:       bf 2d 20 40 00          mov    $0x40202d,%edi
  40117c:       e8 af fe ff ff          call   401030 <puts@plt>
  401181:       b8 00 00 00 00          mov    $0x0,%eax
  401186:       c9                      leave
  401187:       c3                      ret

Disassembly of section .fini:

Looking at the disassembly, we can see a call to strcmp at offset 40115d:

40115d:       e8 de fe ff ff          call   401040 <strcmp@plt>

If we look a couple of lines before that, we can see a instruction that is moving data from an address outside of this section (0x402010):

401155:       be 10 20 40 00          mov    $0x402010,%esi

If we look at the output of objdump -h a.out, we see that this address falls in the .rodata section (we're looking for sections for which the given address is in the block of memory starting at the address in the VMA column):

$ objdump -h a.out
Idx Name          Size      VMA               LMA               File off  Algn
[...]
 15 .rodata       00000041  0000000000402000  0000000000402000  00002000  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
[...]

We can extract the data in that section using the objcopy command:

$ objcopy -j .rodata -O binary a.out >(xxd -o 0x402000)
00402000: 0100 0200 0000 0000 0000 0000 0000 0000  ................
00402010: 7465 7874 5f74 6f5f 636f 6d70 6172 6500  text_to_compare.
00402020: 7465 7874 206d 6174 6368 6573 0074 6578  text matches.tex
00402030: 7420 646f 6565 736e 2774 206d 6174 6368  t doeesn't match
00402040: 00                                       .

And we can see that the string at address 0x402010 is text_to_compare.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • @SaidHamed: Not that this answer is using `objcopy` on a linked executable, not a `.o` object file, which is why there are real addresses in the ELF section and program headers. (Also because it's a non-PIE executable). On a `.o`, you'd just get offsets relative to the start of a section. (Unless you made an offset for `xxd`). – Peter Cordes Oct 18 '22 at 02:46
  • To disassemble a `.o`, normally you'd use `objdump -drwC -Mintel`. The `-r` prints a relocation symbol-name as a comment next to instructions like `mov $0,%esi` where the `0` is just a placeholder address to be filled in by the linker. – Peter Cordes Oct 18 '22 at 02:47
  • 1
    `strings -n 8` will only print runs of 8 or more printable characters, can be useful when the default is too noisy and you expect it to be non-tiny. – Peter Cordes Oct 18 '22 at 02:48