How to obfuscate x86 assembly code?

Question

For my project, I am performing a kind of checksum operation on a portion of code to protect it and therefore do not want its template to be easily visible and therefore need obfuscation.

I have searched a lot on the net and read papers describing obfuscation definitions, types, etc. But there seems to be no tutorial on obfuscating x86 assembly code. Can anybody suggest a simple algorithm/tool for the same?

I have read about inserting dummy code, changing the order of the instructions and other techniques but they appear to be totally random i.e. there is no end to how much dummy code to insert, etc.

Can somebody at least guide me to the correct approach?

This question should be asked at the reverse engineering stack exchange. http://reverseengineering.stackexchange.com/ — Aswin P J, Apr 22 '16 at 15:00
Can you just read the first two lines ? I have modified the question. — white-hawk-73, Apr 22 '16 at 15:08
@JoseManuelAbarcaRodríguez: Seems to blow up your object code by a factor of 30, if you believe the example they provide. — Ira Baxter, Apr 22 '16 at 15:13
That product is for windows. Is there a similar one for linux ? — white-hawk-73, Apr 22 '16 at 15:14
http://stackoverflow.com/questions/7947353/automated-x86-instruction-obfuscation — Jose Manuel Abarca Rodríguez, Apr 22 '16 at 15:18
Would-be closers: in spite of your claims, OP has asked a programming question, "how do I change my code in an organized way?". If you don't understand the question, leave it alone. — Ira Baxter, Apr 22 '16 at 15:27
I think your only real choice to obfuscate assembly is to use machine code. .byte 0xAA,0xBB,0x12,0x34....and "assemble" that. Otherwise it doesnt apply. I guess you can make a bunch of macros... — old_timer, Apr 22 '16 at 18:58
So, depending on platform and security. 1) Get a binary image of the code that does what it should 2) Encrypt it 3) In it's old location, scramble the bits 4) At runtime, decrypt the code to the original location and mark it as executable. — Frank C., Apr 22 '16 at 19:02
My understanding was you want to obfuscate the machine code (even if you wrote "assembler") in the binary to evade easily recognizable patterns? Or why would you want to hide something in source code? — tofro, Apr 22 '16 at 20:52
Machine code that decodes two different ways, depending on where you start, is pretty confusing for most disassemblers. Write code that jumps back into the middle of one of the instructions that just ran, but starting from that point is a different sequence of instructions (which doesn't have the backwards jump). You need to hand-craft sequences like that, since I don't know of any automated way to do it. It's not always possible to find overlapping instruction sequences you can use. — Peter Cordes, Apr 23 '16 at 03:28
@tofro Suppose somebody wants to tamper with my code. The checksum operation would detect if the instructions have been tampered with. But I want to prevent the attacker from easily identifying the checksum code as he might bypass it. — white-hawk-73, May 01 '16 at 07:34
@tofro See, the thing is the attacker would only have the binary. But he can disassemble using gdb, objdump and other techniques to get the x86 assembly, right? And therefore I want the guards to not be easily visible in the assembly. I will be applying obfuscation on the assembly code itself. — white-hawk-73, May 02 '16 at 13:17
@PeterCordes I didn't understand you point. Can you please explain this "Write code that jumps back into the middle of one of the instructions that just ran, but starting from that point is a different sequence of instructions (which doesn't have the backwards jump). " — white-hawk-73, May 04 '16 at 12:24

score 1 · Accepted Answer · edited May 23 '17 at 11:52

1

I don't think you can get a "simple" approach.

An assembler program consists largely of strings of instructions. Each instruction does several differnt things (e.g., add to a register, set condition codes, change the PC and push an address on the stack, ...). However, any particular instruction may be executed only for one of the effects (the "essential effects" for that instruction), with its other effects being ignored.

Your problem is one of changing a working (assembler) program, preserving the essential effects and using the freedom allowed by ignoring the nonessential ones to add confusion. Fundamentally you can preserve effects by discovering, for a particular instruction sequence having a particular effect, another instruction (sequence) which has the exact same algebraic effect and placing the answers in the same target locations as the original sequence.

What you need is a way to "replace this by (the equivalent) that" for a variety of this and that which are algebraically the same.

You can do this by hand. How much is enough? Stop when the code you want to protect is sufficiently hard for you to understand. [This will probably give you a self-inflicted code maintenanance problem, if you ever want to change that code].

An alternative is to use Program Transformation System, which is a tool for transforming source code, parameterized by descriptions of the programming language to be transformed. See an example of this here: https://stackoverflow.com/a/7947562/120163 This kind of approach means you can leave your code in its original "maintainable" state, and then apply obfuscating transforms as the last build step.

edited May 23 '17 at 11:52

Community

1
1

answered Apr 22 '16 at 15:25

Ira Baxter

93,541
22
172
341

Thank you. I read that answer as well. But could you explain how do I use that DMS system? – white-hawk-73 May 02 '16 at 13:26
@ak0817: I'm not sure what you are asking. The referenced SO answer describes how to use DMS. That's hardly complete; DMS is a fairly complex system, as it must be to deal with very complex langauge issues reliably. If you have a specific question, start another SO question. – Ira Baxter May 02 '16 at 14:00
Exactly. I read and didn't understand much. I guess doing it manually might be a better option. – white-hawk-73 May 02 '16 at 14:10
If you are only going to obfuscate a few hundred instructions, it is probably a lot easier to do it manually. If you are going to do a lot more than that, you'll need a tool. If you want to understand more about how DMS rewrite rules work, see http://www.semdesigns.com/Products/DMS/DMSRewriteRules.html – Ira Baxter May 02 '16 at 14:18
I have to obfuscate a code template which is of about 20 lines max. So, I guess manually would be better. But what confused me was there is no fixed way to do it, so didn't know when to stop and in what order to apply steps - it all sounds random. – white-hawk-73 May 02 '16 at 14:34
If you only do 20 instructions, you really haven't obfuscated the code. No, it isn't "random"; my answer above addresses what you need to do. – Ira Baxter May 02 '16 at 14:52
The thing is, for my purpose I just need to obfuscate a small portion of the code. And it's length won't go beyond 20 lines. I read your answer as well as the answers given on the referred SO question as well, where it's mentioned that I have to replace the instruction to obfuscate by its algebraic equivalent. But I also read techniques like inserting dummy code, etc. and these all in combination confused me as to in what order these techniques have to be applied. That is why I used 'random' . – white-hawk-73 May 02 '16 at 16:32
If each transformation you do is either algebraic or inserts dummy code, then it doesn't matter what order you do it in, as each step preserves the code functionality, ensuring the result is still correct. You will get different obfuscated results depending on the order. So, pick an order and do that. (In that sense, you can pick any random order of obfuscation steps, but the process is not random). – Ira Baxter May 03 '16 at 08:38
You wrote "If each transformation you do is either algebraic or inserts dummy code, then it doesn't matter what order you do it in..." Can you suggest some other transformations as well( changing the flow of data, etc. )? Also, if my code consists of mov, cmp, jmp, I can't apply algebraic transformations, right? I want to apply obfuscation on a really small protion of code -10 to 20 lines..therefore I am not really sure how do I go about it. – white-hawk-73 May 03 '16 at 18:37
I think you miss the point of "algebraic" in the general sense. You have some set of instructions A; you can replace them by B, if used_effects(A)==used_effects(B). This means you can replace any set of instructions by an "equivalent" set that are (algebraically) identical. That covers instruction replacment, dummy instruction insertion (this the computer instruction equivelant of "add zero"), or scrambling control flow. As long as each replacement is algebraic, you can do them in any order. 20 lines won't be enough obfuscation unless you put in something which is equal but bizarre. – Ira Baxter May 03 '16 at 19:00
Thank you. I am trying to obfuscate some programs on my own by using the tips which you gave. But the thing is, when you insert dummy code the natural question that comes to mind is what and how to insert? This is why I used the word 'random'. Should I add statements that cancel each other(add, subtract, etc.) and apart from that what can I do? – white-hawk-73 May 04 '16 at 12:58

How to obfuscate x86 assembly code?

1 Answers1