Finding Opcodes by length or something else

Question

Is it possibile, given a sequence of bytes x86 instructions into a stream of random bytes, to decode their instructions?

Are opcodes of a fixed length or is there any way to detect those instructions?

possible duplicate of [With variable length instructions how does the computer know the length of the instruction being fetched?](http://stackoverflow.com/questions/24269368/with-variable-length-instructions-how-does-the-computer-know-the-length-of-the-i) — Pascal Cuoq, Jul 05 '14 at 10:57

score 1 · Accepted Answer · answered Jul 05 '14 at 10:48

1

Is it possibile, given a sequence of bytes x86 instructions into a stream of random bytes, to decode their instructions?

Yes. Many kinds of processors do it. It is one of the easiest task they have.

Are opcodes of a fixed length

No.

or is there any way to detect those instructions?

The first byte(s) of the instruction allow to infer its length. You will find tables easily on the Internet.

answered Jul 05 '14 at 10:48

Pascal Cuoq

79,187
7
161
281

3

The variable instruction length makes it problematic to know if you are starting to disassemble from the right byte, though. – DCoder Jul 05 '14 at 10:51
@DCoder (after reading your bio) in practice, I would have thought that you started disassembling from address A because you had encountered a jump to A or for some other reason that led you to think that A was a good starting point, but I guess that computed jumps mean you aren't always sure… – Pascal Cuoq Jul 05 '14 at 10:54
1

I interpreted the question as "how can I find a sequence of instructions in this random stream of bytes", which would make the starting point fuzzy, but that might just be my interpretation. You are right, of course, if you can see that address A is an instruction starting point, you start from there. – DCoder Jul 05 '14 at 11:02
Dcoder is right that's the point I was interested in: you don't know the starting address – user129506 Jul 05 '14 at 11:24
@user129506 In this case, it depends if you are talking about a sequence of bytes maliciously crafted to look like instructions or about normal data. All possible starting points will soon either synchronize (**after** a few dozen of bytes, all possible starting point lead to the same sequencing) or be revealed to be nonsensical as a stream of instruction. Recognizing a few instructions in a large stream of what is otherwise data is impossible in general, unless you know something about the instructions (e.g. compiler that was used to generate them) – Pascal Cuoq Jul 05 '14 at 11:28
Thanks! I was curious of the first: malicious hidden code. So it never happens that if I get the wrong entry point for a long program I might interpret the entire sequence as a totally different program? – user129506 Jul 05 '14 at 11:34
2

@user129506 This is an interesting idea. Yes, with a bit of determination, I would say that it is possible to write arbitrarily long instructions sequences at an offset within instruction sequences, for nearly any task interpretation1 and interpretation2 are supposed to do. People already write sequences of instructions designed not to contain the byte 0, and this is of the same level of difficulty. In particular, even when a short instruction exists to do something, it is possible to choose a longer instruction to do the same thing. I think I am going to ask my own SO question about this – Pascal Cuoq Jul 05 '14 at 11:41
Thank you, I'm making up my mind on these concepts but they're definitely interesting – user129506 Jul 05 '14 at 11:43
1

@user129506: There are lots of tricks that can be used to make assembly harder to read and follow. For example, code can contain multiple jumps into different bytes of the same instruction (only one of them ever taken), or, more deviously, one of those jumps can target a byte with a junk value just before the real instruction, just to confuse the disassembler into misreading that instruction. [*Secrets of Reverse Engineering*](http://www.amazon.com/dp/0764574817) has a whole chapter on antidebugging techniques. (**Edit:** also, visit [RE.SE](http://reverseengineering.stackexchange.com/).) – DCoder Jul 05 '14 at 11:50
@DCoder Perhaps you could move or expand this comment into an answer at http://stackoverflow.com/questions/24586242 ? Note that in my question, the first version of the instructions (doing task A) can actually be executed at run-time, to further confuse reverse-engineering tools. – Pascal Cuoq Jul 05 '14 at 12:07

Finding Opcodes by length or something else

1 Answers1

Linked