0

For an assignment, I wrote the following assembly code shell_exec.asm that should execute a shell in Linux:

section .data ; declare stuff
  arg0 db "/bin/sh",0 ; 1st arg
  align 4
  argv dd arg0, 0 ; 2nd arg
  envp dd 0 ; 3rd arg

section .text
global _start
_start:
  mov eax, 0x0b ; execve
  mov ebx, arg0 ; 1st arg
  mov ecx, argv ; 2nd arg
  mov edx, envp ; 3rd arg
  int 0x80 ; kernel

I used nasm -f elf32 shell_exec.asm for compilation and ld -m elf_i386 -o shell_exec shell_exec.o for linking. Everything works so far and if I run ./shell_exec the shell spawns the way I want.

Now I wanted to extract the shell code (like \12\34\ab\cd\ef...) from this program. I used objdump -D -z shell_exec to show all parts of the code including the section .data and all zeroes. The output is as follows:

shell_exec:     file format elf32-i386


Disassembly of section .text:

08049000 <_start>:
 8049000:       b8 0b 00 00 00          mov    $0xb,%eax
 8049005:       bb 00 a0 04 08          mov    $0x804a000,%ebx
 804900a:       b9 08 a0 04 08          mov    $0x804a008,%ecx
 804900f:       ba 10 a0 04 08          mov    $0x804a010,%edx
 8049014:       cd 80                   int    $0x80

Disassembly of section .data:

0804a000 <arg0>:
 804a000:       2f                      das
 804a001:       62 69 6e                bound  %ebp,0x6e(%ecx)
 804a004:       2f                      das
 804a005:       73 68                   jae    804a06f <__bss_start+0x5b>
 804a007:       00                      add    %al,(%eax)

0804a008 <argv>:
 804a008:       00 a0 04 08 00 00       add    %ah,0x804(%eax)
 804a00e:       00 00                   add    %al,(%eax)

0804a010 <envp>:
 804a010:       00 00                   add    %al,(%eax)
 804a012:       00 00                   add    %al,(%eax)

If I only have a section .text within my assembly code, I can usually just copy all given values and use them as my shell code. But how is the order in case I have those two sections, namely .data and .text?

Edit 1

So, my second attempt is to do the assembly code like this:

section .text
global _start

_start:
mov ebp, esp

xor eax, eax
push eax ; -4
push "/sh " ; -8
push "/bin" ; -12

xor eax, eax
push eax
lea ebx, [ebp-12] 
push ebx ; 1st arg

mov ecx, esp ; 2nd arg
lea edx, [ebp-4] ; 3rd arg

mov eax, 0x0b ; execve
int 0x80 ; kernel

This avoids using multiple sections, but sadly leads to a segmentation fault.

TiMauzi
  • 190
  • 1
  • 3
  • 16
  • Are you talking about dumping to shellcode when you say "bytecode"? Effectively turning this into a position-independent flat binary? Hmm, yes I think you are. (bytecode has a different meaning). You can't use a separate `.data` section for that; normal linking will put your `.data` at least a 4k page away from your `.text`, so unless you want your payload to include a huge amount of padding, it's not doable. – Peter Cordes Jan 17 '21 at 21:33
  • The order is defined by the linker script and the ELF loader doesn't care, as the important things are the addresses. If you want a single dump there is hardly any point in using multiple sections. – Jester Jan 17 '21 at 21:33
  • @PeterCordes So that would mean, I would have to do the assembly code only within one section with no workaround available for that? – TiMauzi Jan 17 '21 at 21:36
  • 1
    Of course there are workarounds, like simply putting your static data in `.text`. (Which in actual shellcode will be read+write+exec). They're not "declarations", they actual directives that emit bytes into the output just like `xor eax,eax`. Of course, if you want your shell code to be free from `0` bytes, you normally want to construct it on the fly. And of course static initializers can't include absolute pointers unless you fix them up to exactly match the address you're injecting this at. (i.e. anything that needs a NOP slide would break this.) – Peter Cordes Jan 17 '21 at 21:39
  • Be careful, variable length instruction sets like x86 are painful to impossible to disassemble properly as can easily be seen with objdump. So if you are trying to do this programmatically you may still have to hand fix the output. Or you can write your own disassembler that works in execution order, but you will still not be able to achieve 100% accuracy. Even a hand written instruction set simulator isnt 100% because you cant generally force the code through all the paths. – old_timer Jan 17 '21 at 21:56
  • do you mean machine code when you are talking about bytecode? – old_timer Jan 17 '21 at 21:57
  • @old_timer You are probably right regarding the machine code term, I think this is just some language mistake :) – TiMauzi Jan 17 '21 at 22:11
  • @PeterCordes I added a second (but freely, not very promising) attempt, where I tried not using two sections. But if I understand you correctly, there shouldn't be any static data like `"bin/"` at all, right? How would I get around that if I don't want to write the shellcode/bytecode/machine code by myself? – TiMauzi Jan 17 '21 at 22:24

0 Answers0