First, this is kind of a follow-up to Custom memory allocator for real-mode DOS .COM (freestanding) — how to debug?. But to have it self-contained, here's the background:
clang
(and gcc
, too) has an -m16
switch so long instructions of the i386
instruction set are prefixed for execution in "16bit" real mode. This can be exploited to create DOS .COM
32bit-realmode-executables using the GNU linker, as described in this blog post. (of course still limited to the tiny memory model, means everything in one 64KB segment) Wanting to play with this, I created a minimal runtime that seems to work quite nice.
Then I tried to build my recently-created curses-based game with this runtime, and well, it crashed. The first thing I encountered was a classical heisenbug: printing the offending wrong value made it correct. I found a workaround, only to face the next crash. So the first thing to blame I had in mind was my custom malloc()
implementation, see the other question. But as nobody spotted something really wrong with it so far, I decided to give my heisenbug a second look. It manifests in the following code snippet (note this worked flawlessly when compiling for other platforms):
typedef struct
{
Item it; /* this is an enum value ... */
Food *f; /* and this is an opaque pointer */
} Slot;
typedef struct board
{
Screen *screen;
int w, h;
Slot slots[1]; /* 1 element for C89 compatibility */
} Board;
[... *snip* ...]
size = sizeof(Board) + (size_t)(w*h-1) * sizeof(Slot);
self = malloc(size);
memset(self, 0, size);
sizeof(Slot)
is 8 (with clang
and i386
architecture), sizeof(Board)
is 20 and w
and h
are the dimensions of the game board, in case of running in DOS 80 and 24 (because one line is reserved for the title/status bar). To debug what's going on here, I made my malloc()
output its parameter, and it was called with the value 12 (sizeof(board) + (-1) * sizeof(Slot)
?)
Printing out w
and h
showed the correct values, still malloc()
got 12. Printing out size
showed the correctly calculated size and this time, malloc()
got the correct value, too. So, classical heisenbug.
The workaround I found looks like this:
size = sizeof(Board);
for (int i = 0; i < w*h-1; ++i) size += sizeof(Slot);
Weird enough, this worked. Next logical step: compare the generated assembly. Here I have to admit I'm totally new to x86
, my only assembly experience was with the good old 6502
. So, In the following snippets, I'll add my assumptions and thoughts as comments, please correct me here.
First the "broken" original version (w
, h
are in %esi
, %edi
):
movl %esi, %eax
imull %edi, %eax # ok, calculate the product w*h
leal 12(,%eax,8), %eax # multiply by 8 (sizeof(Slot)) and add
# 12 as an offset. Looks good because
# 12 = sizeof(Board) - sizeof(Slot)...
movzwl %ax, %ebp # just use 16bit because my size_t for
# realmode is "unsigned short"
movl %ebp, (%esp)
calll malloc
Now, to me, this looks good, but my malloc()
sees 12, as mentioned. The workaround with the loop compiles to the following assembly:
movl %edi, %ecx
imull %esi, %ecx # ok, w*h again.
leal -1(%ecx), %edx # edx = ecx-1? loop-end condition?
movw $20, %ax # sizeof(Board)
testl %edx, %edx # I guess that sets just some flags in
# order to check whether (w*h-1) is <= 0?
jle .LBB0_5
leal 65548(,%ecx,8), %eax # This seems to be the loop body
# condensed to a single instruction.
# 65548 = 65536 (0x10000) + 12. So
# there is our offset of 12 again (for
# 16bit). The rest is the same ...
.LBB0_5:
movzwl %ax, %ebp # use bottom 16 bits
movl %ebp, (%esp)
calll malloc
As described before, this second variant works as expected. My question after all this long text is as simple as ... WHY? Is there something special about realmode I'm missing here?
For reference: this commit contains both code versions. Just type make -f libdos.mk
for a version with the workaround (crashing later). To compile the code leading to the bug, remove the -DDOSREAL
from the CFLAGS
in libdos.mk
first.
Update: given the comments, I tried to debug this myself a bit deeper. Using dosbox' debugger is somewhat cumbersome, but I finally got it to break at the position of this bug. So, the following assembly code intended by clang
:
movl %esi, %eax
imull %edi, %eax
leal 12(,%eax,8), %eax
movzwl %ax, %ebp
movl %ebp, (%esp)
calll malloc
ends up as this (note intel syntax used by dosbox' disassembler):
0193:2839 6689F0 mov eax,esi
0193:283C 660FAFC7 imul eax,edi
0193:2840 668D060C00 lea eax,[000C] ds:[000C]=0000F000
0193:2845 660FB7E8 movzx ebp,ax
0193:2849 6766892C24 mov [esp],ebp ss:[FFB2]=00007B5C
0193:284E 66E8401D0000 call 4594 ($+1d40)
I think this lea
instruction looks suspicious, and indeed, after it, the wrong value is in ax
. So, I tried to feed the same assembly source to the GNU assembler, using .code16
with the following result (disassembly by objdump
, I think it is not entirely correct because it might misinterpret the size prefix bytes):
00000000 <.text>:
0: 66 89 f0 mov %si,%ax
3: 66 0f af c7 imul %di,%ax
7: 67 66 8d 04 lea (%si),%ax
b: c5 0c 00 lds (%eax,%eax,1),%ecx
e: 00 00 add %al,(%eax)
10: 66 0f b7 e8 movzww %ax,%bp
14: 67 66 89 2c mov %bp,(%si)
The only difference is this lea
instruction. Here it starts with 67
meaning "address is 32bit" in 16bit real mode. My guess is, this is actually needed because lea
is meant to operate on addresses and just "abused" by the optimizer to do data calculation here. Are my assumptions correct? If so, could this be a bug in clang
s internal assembler for -m16
? Maybe someone can explain where this 668D060C00
emitted by clang
comes from and what may be the meaning? 66
means "data is 32bit" and 8D
probably is the opcode itself --- but what about the rest?