Finding instruction boundaries for disassembly

Question

Not sure if this the right community for the question, but bear with me...

On an old Zilog Z80 CPU, it is possible to jump to whatever byte address you want in memory. So that means it is also possible to jump right in the middle of an instruction.

Consider the machine code 21 00 C9 (LD HL, $C900), which sets the HL register to 0xC900. If you were to jump in front of and skip the 21 (say, by doing JR -2 right after the above instruction), the instruction sequence becomes 00 C9: NOP followed by RET; a completely different thing.

Furthermore, a region of garbage memory or non-executable data that may be interpreted as code, can 'desynchronize' code coming after it. If, say, the last byte of a data region is 21, as above, then the next two bytes (which are really the start of code) might be interpreted as the immediate value for the LD HL, xxxx instruction, which in turn can completely change how the block of code is disassembled, because the first two bytes change meaning.

So, my question is: How does a disassembler determine where instruction boundaries are, with these corner cases in mind?

One way is for the disassembler to use a script text file that defines sets of ranges for code and data. It may take a few passes to edit and get the script right, but trying to make disassembler to determine the boundaries for the examples shown would be very complicated. — rcgldr, Aug 24 '17 at 14:42
Your question already has an answer on [RE SE](https://reverseengineering.stackexchange.com/questions/2347/what-is-the-algorithm-used-in-recursive-traversal-disassembly). — Margaret Bloom, Aug 24 '17 at 14:46
for variable length instructions like these the only way is to follow the execution paths. Unfortunately some of the roms you find for those old systems if not most were hand coded assembly and sometimes they intentionally put code in to defeat disassemblers. So you have to just grind through it. Then if there is a jump table, something that has to be executed to determine the possible jump destinations, those paths cannot be covered... — old_timer, Aug 24 '17 at 20:12
I remember an x86 disassembler that would do its best, but you could then tell it additional entry points and it would grind through that next section or if you believe it was off and entered somewhere wrong, you could adjust that and it would re-do that part. — old_timer, Aug 24 '17 at 20:15
the asteroids rom and or asteroids deluxe has a comparision or a flag setting/clearing instruction followed by a jump if that condition which as a pair makes it an unconditional branch, the data bytes after that conditional branch cause the code that follows to collide with the instructions that happen some number of bytes later such that the disassembler disagrees with itself as to where instructions start. for that you have to make your disassembler stop on that special address on that specific rom. — old_timer, Aug 24 '17 at 20:16
The ZX Spectrum ROM has 1000s of data bytes inlined to its code. Things like `rst 40; .byt $E1; .byt $E0; .byt $E2; (9 more); and a; ret`, where the internal calculator is called with a row of special opcodes. A different OS may treat `rst 40` totally different. You'll never dissamble that correctly without human intervention, sorry. — yacc, Aug 25 '17 at 21:12

Finding instruction boundaries for disassembly

0 Answers0