How to assemble an IDA-generated listing back into an identical executable?

Question

I am trying to reassemble an IDA-generated assembly file back into a real mode 16bit MZ executable from which the disassembly was generated. I am using TASM:

tasm /m2 hello
tlink hello

This gives me a lot of warnings with the message "Segment alignment not strict enough" on the lines where IDA generated 'align' directives:

; HELLO.ASM
        .8086
        .model large

; Segment type: Pure code
seg000      segment byte public 'CODE'
        assume cs:seg000
        assume es:nothing, ss:nothing, ds:dseg

[...]
locret_10240:               ; CODE XREF: sub_1022E+2j
        retn
sub_1022E   endp

        align 2             ; generates the warning

; Attributes: library function bp-based frame
__FF_MSGBANNER  proc near       ; CODE XREF: start+28p  start+A4p ...
        push    bp
        mov bp, sp
        mov ax, 0FCh
[...]

The program assembles, links and even runs, but crashes upon termination. The executable size is also a couple bytes different from what I originally started with.

Why does IDA generate these align directives?
How can I fix the alignment problem and recreate an identical executable?

Just a guess, but an assembler warning about an `align` directive might be due to not being able to correctly honour it: if the segment you're in doesn't have known alignment (for the start of the segment), it won't know whether a given offset in it is at an odd or even linear address. — Peter Cordes, May 31 '22 at 22:19

neuviemeporte · Answer 1 · 2022-06-01T22:07:37.910

1

I managed to get rid of the warnings by editing the affected (CODE) segment and changing the alignment to a larger value (was: byte, changed to paragraph):

seg000      segment para public 'CODE'

Looking at the hex view of the original executable, there seem to be zero bytes at the position where the align is placed:

        retn          ; 0xc3 (RET opcode) in hex view
sub_1022E   endp
        align 2       ; zero byte follows
__FF_MSGBANNER  proc near       ; CODE XREF: start+28p  start+A4p ...
        push    bp    ; 0x55 (PUSH BP opcode) in hex view

Looks like the original executable was assembled in this layout (probably for performance reasons), and IDA is generating these directives to enforce the same layout.

The remaining problems were caused by TASM emitting JMP instructions with a 8bit offset instead of 16bit one in some places, which shifted the offsets and made some instructions down the line point to incorrect locations. This could probably be fixed by spending more time in IDA to convert numeric values to offset references where it failed to recognize them, but this being a proof of concept, I worked around it by changing instructions like JMP loc_1234 to a sequence of DB 0xE9 followed by a DW loc_1234 - $ -2.

After that, TASM succeeded in assembling a clone of the executable that I started with from the IDA-generated disassembly.

edited Jun 01 '22 at 22:07

answered May 31 '22 at 22:31

neuviemeporte

6,310
10
49
78

Have you considered using a better assembler that doesn't ignore `jmp near loc_1234` to force the rel16 encoding? (According to that Q&A you linked, TASM does unfortunately ignore it. Or at least ignores `near ptr`, it doesn't mention trying just `near`.) Having to rewrite all `jmp` instructions to manual encoding seems pretty inconvenient. – Peter Cordes Jun 01 '22 at 22:37
I have, but unfortunately the assembly that IDA spits out seems to be meant to be reassembled with TASM, plus I am working on a project specifically for 16bit DOS, so my options are rather limited. Rewriting the jumps is not very inconvenient, it's just a few locations where the compiler decided to put a `0xE9` opcode on what would otherwise be short jump, probably for alignment reasons. – neuviemeporte Jun 02 '22 at 10:47
1

If it's not every jmp, then more likely because the original was linked from multiple `.asm` / `.o` files, and jumps (e.g. tailcalls) to symbols in other object files had to leave room for a 16-bit displacement, since the actual distance was unknown at assemble time, only known during link time. I think that's more likely than hand-crafted instruction lengthening to align certain later points (like maybe other jump targets). Although that is possible, and more efficient than using NOPs. – Peter Cordes Jun 02 '22 at 16:52
@PeterCordes good point, this is likely the reason, thanks. – neuviemeporte Jun 03 '22 at 09:41

How to assemble an IDA-generated listing back into an identical executable?

1 Answers1