7

I'm working on an x86 asm obfuscator that takes Intel-syntax code as a string and outputs an equivilent set of opcodes that are obfuscated.

Here's an example:

mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI

Becomes something like:

mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523     ;
push [ebx]          ; xor eax, [ebx]
or [esp], eax       ;
pop eax             ;
push 12345h         ; push eax
mov [esp], eax      ;
call getEIP         ; call someAPI
getEIP:             ;
add [esp], 9        ;
jmp someAPI         ;

This is just an example, I've not checked that this doesn't screw up flags (it probably does).

Right now I have an XML document that lists instruction templates (e.g. push e*x) and a list of replacement instructions that can be used.

What I'm looking for is a way to automatically generate opcode sequences that produce the same result as an input. I don't mind doing an educated bruteforce, but I'm not sure how I'd approach this.

Polynomial
  • 27,674
  • 12
  • 80
  • 107
  • What are you hoping to achieve by doing this? – jalf Oct 30 '11 at 22:08
  • 3
    _"I'm working on an x86 asm obfuscator"_ <- did that not give it away? :P – Polynomial Oct 31 '11 at 08:52
  • mmm, no. Honestly not. I don't see what problem it is supposed to solve. It seems like you are making your code *dramatically* slower, to solve a non-problem. If it is a small, simple program, and it was hand-written in ASM in the first place, it might be viable to reverse-engineer it from that. But any program *worth* reverse-engineering is complex enough that doing so would be impractical. And assuming the code goes through a compiler with optimizations enabled, it is pretty well obfuscated already. So no, I don't see why you need to obfuscate x86 assembly. – jalf Oct 31 '11 at 08:55
  • 1
    Combining anti-debugger tricks with obfuscated code is useful in copy-protection applications. A simple example would be `and eax, 0x40; cmp eax, 0x40`, which is pretty obviously a flag mask test. When obfuscated (e.g. to `push eax; xor [esp], 0xFFFFFFFF; and [esp], 0x40; pop eax; cmp eax, 0`) this would be significantly more difficult to understand, especially if the entire control flow is also obfuscated through indirect jumps and call tables. As long as you're obfuscating short or infrequently used code sequences, the performance hit should be negligable. – Polynomial Oct 31 '11 at 09:12
  • Well, fine, it's your product. If you believe in what's really nothing more than superstitious "anti-pirate" mumbo-jumbo, feel free to cripple your own product with it. I'm sure all the pirates are absolutely quaking in their pants at the prospect of *yet another* copy-protection scheme which works nothing different than all the ones that have been cracked within days or hours for the last 30 years. I'm just trying to understand the use case where this would actually *benefit you*. – jalf Oct 31 '11 at 09:23
  • 1
    Your sarcasm isn't constructive. I'm aware that pirates can easily bypass stuff like this. The point is that it makes their job a little harder, and I can't be blamed for not including some form of anti-copy when shit roles downhill. I also don't see how this cripples anything - the performance hit will be absolutely miniscule. – Polynomial Oct 31 '11 at 09:28
  • You can absolutely be blamed for introducing complexity (potential subtle bugs, and this would *not* be the first copy-protection measure to contain severe bugs), and for slowing down the product for no reason. And of course, for spending resources (yours or your company's) on something with zero benefit. The code only needs to be cracked *once*. And if my sarcasm was constructive, I'd have put it in an answer. Comments don't need to be constructive. ;) (But tbh, asking why you're doing something utterly pointless is fairly constructive. Much more so than trying to do something pointless) – jalf Oct 31 '11 at 09:32
  • 1
    A fair point, but there's still no reason to be downright rude. Your derisive tone just makes me switch off. If you want to be taken seriously, work on the attitude. – Polynomial Oct 31 '11 at 09:39
  • http://www.mendeley.com/research/binob-framework-potent-stealthy-binary-obfuscation-3/ – Ira Baxter Nov 01 '11 at 17:41

2 Answers2

16

What you need is an algebraic descripton of what the opcodes do, and a set of algebraic laws that allow you to determine equivalent operations.

Then for each instruction, you look up its algebraic description (for the sake of an example, an

 XOR  eax,mem[ecx]

whose algebraic equivalent is

 eax exclusive_or mem[ecx]

enumerate algebraic equivalences using those algebra equivalents, such as:

 a exclusive_or b ==> (a and not b) or (b and not a)

to generate equivalent algebraic statement for your XOR instruction

 eax exclusive_or mem[ecx] ==> (eax and not mem[ecx]) or (mem[ecx] and not eax)

You may apply more algebraic laws to this, for instance de morgans' theorem:

 a or b ==> not (not a and not b)

to get

(not (not (eax and not mem[ecx])) and (not (mem[ecx] and not eax)))

At this point you have a specification of an algebraic computation that will do the same thing as the original. There's your brute force.

Now you have to "compile" this to machine instructions by matching what instructions will do with what this says. Like any compiler, you likely want to optimize the generated code (no point in fetching mem[ecx] twice). (All of this hard... its a code generator!) The resulting code sequence would be something like:

mov ebx, mem[ecx]
mov edx, ebx
not edx
and edx, eax
not eax
and eax, ebx
not eax
or eax, edx

This is a lot of machinery to build manually.

Another way to do this is to take advantage of a program transformation system that allows you to apply source-to-source transformations to code. Then you can encode "equivalences" as rewrites directly on the code.

One of these tools is our DMS Software Reengineering Toolkit.

DMS takes a langauge definition (essentially as an EBNF), automatically implements a parser, AST builder, and prettyprinter (anti parser, turning AST back into valid source text). [DMS doesn't presently have an EBNF for ASM86, but dozens of EBNFs for various complex langauges have been build for DMS including several for miscellaneous non-x86 assemblers So you'd have to define the ASM86 EBNF to DMS. This is pretty straightforward; DMS has a really strong parser generator].

Using that, DMS will let you write source transformations directly on the code. You could write the following transformations that implement the XOR equivalant and DeMorgan's law directly:

  domain ASM86;

  rule obfuscate_XOR(r: register, m: memory_access):instruction:instruction
  =  " XOR \r, \m " 
      rewrites to
     " MOV \free_register\(\),\m
       NOT \free_register\(\)
       AND \free_register\(\),\r 
       NOT \r
       AND \r,\m
       OR \r,\free_register\(\)";

 rule obfuscate_OR(r1: register, r2: register):instruction:instruction
 = " OR \r1, \r2"
     rewrites to
    " MOV \free_register\(\),\r1
      NOT \free_register\(\)
      AND \free_register\(\),\r2
      NOT \r2
      AND \r1,\r2
      NOT \r1";

with some additional magic in a meta-procedure called "free_register" that determines what registers are free at that point (of the AST match) in the code. (If you don't want to do that, use the top of the stack as temporary everywhere as you did in your example).

You'd need a bunch of rewrites to cover all the cases that you want to obfuscate, with thier combinatorics with registers and memory operands.

Then the transformation engine can be asked to apply these transformations randomly once or more than once at each point in the code to scramble it.

You can see a fully worked example of some algebraic transforms being applied with DMS.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 1
    Wow, this is possibly the most detailed and informative answer I have ever seen on SO. I'll take a proper look into it all and weigh up my options. Thank you very much :) – Polynomial Oct 30 '11 at 20:57
  • I'm starting to research about obfuscators and this was really helpful! – karliwson Jun 29 '14 at 23:20
0

Take a look at the Obfusion project. It can obfuscate x86 shellcode pretty well. However, it does not seem to support 64-bit. Most code, algorithms and ideas from this project can probably be applied to your needs though.

Also another very nice project to look into is ADVobfuscator but it applies to C/C++ source code obfuscation via macros.

Another approach could be implementing transformations on top of a disassembler engine's internal representation of instructions like Zydis.

And don't forget about LLVM-obfuscator which is a C/C++ compiler with obfuscation flags.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185