How is the global descriptor table copied in MINIX assembly code (x86)

Question

This code in MINIX 3 copies boot monitor's (bootstrap) GDT to the Kernel space and switches over it. But I'm having a hard time understanding the code. In the code, _gdt is the address of an array of descriptor tables declared in C (gdt[GDT_SIZE]). The structure gdt is as follows:

struct segdesc_s {      /* segment descriptor for protected mode */
  u16_t limit_low;
  u16_t base_low;
  u8_t base_middle;
  u8_t access;          /* |P|DL|1|X|E|R|A| */
  u8_t granularity;     /* |G|X|0|A|LIMT| */
  u8_t base_high;
};

The size of the structure is 8 bytes. The macro GDT_SELECTOR has the value 8.

! Copy the monitor global descriptor table to the address space of kernel and
! switch over to it.  Prot_init() can then update it with immediate effect.

     sgdt   (_gdt+GDT_SELECTOR)     ! get the monitor gdtr
     mov    esi, (_gdt+GDT_SELECTOR+2)  ! absolute address of GDT
     mov    ebx, _gdt           ! address of kernel GDT
     mov    ecx, 8*8            ! copying eight descriptors
copygdt:
eseg movb   al, (esi)
     movb   (ebx), al
     inc    esi
     inc    ebx
     loop copygdt

The most confusing line is movb (ebx), al. Please help.

Peter Cordes · Answer 1 · 2018-02-06T05:22:18.653

3

This is weird asm syntax. It's using () for memory operands like AT&T syntax, but it only makes sense if it's destination on the left like Intel syntax. (It's also using AT&T-style mnemonic suffixes for operand-size, like movb for byte mov.)

I think it's basically NASM syntax, but with () instead of [], because the comment says mov ebx, _gdt is a mov-immediate of the address. In GAS .intel_syntax noprefix, that would be a load like in MASM syntax.

Minix's compiler has it's own flavour of asm, and it's documented here. (Thanks @MichaelPetch).

So this is a byte-at-a-time copy loop, from es:esi to ds:edi, for ecx=8*8 bytes. This is exactly what the comments say it does, so that makes it easy to figure out this syntax I hadn't seen before.

movb (ebx), al stores AL into memory, at the address in EBX. i.e. NASM mov [ebx], al or AT&T mov %al, (%ebx).

The store is using the default segment selector for EBX, which is DS. You wouldn't normally need to mention segments in 32-bit mode, but notice the eseg prefix on the load. You haven't shown, and the comments don't mention, what ES is set to, and why / how it's different from DS.

It seems the code is optimized for code-size, not speed (which is ok because it only runs once at startup). e.g. it's using the slow loop instruction, and it copies one byte at a time so it can inc the pointers (1 byte) instead of add esi, 4 (3 bytes). Still, I suspect that with an indexed addressing mode, you could make it just about as small but copy 4 bytes at a time. (The byte count is fixed at 8*8, so it's always a multiple of 4.)

The loop is very close to what rep movsb (or rep movsd) does, which is to copy ecx elements from DS:(E)SI to ES:(E)DI. (The ds can be overridden with a segment prefix, so you could e.g. copy from fs:esi to es:edi). But in the Minix code, the loads are from ES:something, and movs always uses es as the destination segment.

fseg rep movsd would have been even more compact (and faster) than a loop, but presumably there was some obstacle to setting up segment registers appropriately. Using EDI and ESI instead of ESI and EBX shouldn't be an obstacle.

edited Feb 06 '18 at 05:22

answered Feb 06 '18 at 05:11

Peter Cordes

328,167
45
605
847

1

The syntax is correct as it is the Minix CC/assembly syntax (based on Intel). The parentheses have the same meaning as the brackets. – Michael Petch Feb 06 '18 at 05:13
@MichaelPetch: I assumed it was correct and for an obscure assembler. It's still weird (to me at least, and in general in 2018). I'd say the same thing about Go assembler: correct but weird. – Peter Cordes Feb 06 '18 at 05:14
@MichaelPetch: I'm only 38, and interesting in asm mostly for making stuff run fast on modern systems (mostly in user-space), not for OS development. Until the last few years, I'd only been casually interested in asm (after learning m68k a while before that), and hadn't really got into looking at / improving compiler output on a regular basis. I've never even used Minix. Linux already existed when I got my first PC. – Peter Cordes Feb 06 '18 at 05:17
No problem. The variant of assembly used in this code is defined here: http://www.woodhull.com/newfaq/faq/MinixAsMn.html – Michael Petch Feb 06 '18 at 05:19
Why does the byte to byte copy begin at `_gdt+GDT_SELECTOR+2` ? The `+2` is very confusing. GDT_SELECTOR will go to the first index of `gdt` as GDT_SELECTOR is 8 bytes. But the +2 will make it copy from `base_low` in the structure at index 1. Please help. Thank you! – Suvrat Apte Feb 06 '18 at 15:42
1

@SuvratApte sgdt returns 6 bytes and they store the 6 bytes at _gdt+GDT_SELECTOR. The first two bytes are the length of the GDT and the the remaining 4 bytes are the address of the GDT. (_gdt+GDT_SELECTOR+2) then is the base address of the GDT you need to copy. The parentheses in`mov `esi,, (_gdt+GDT_SELECTOR+2)` say to get the 4 bytes at that location and store it to ESI. – Michael Petch Feb 06 '18 at 15:45
@SuvratApte: The copy doesn't begin there. `mov edi, (_gdt+GDT_SELECTOR+2)` *loads* from that address. It'is an offset relative to the address where it used `sgdt` to store 6 bytes into memory. Look up what `sgdt` does in an insn ref manual: https://github.com/HJLebbink/asm-dude/wiki/SGDT. – Peter Cordes Feb 06 '18 at 15:48
@MichaelPetch thank you so much! But I still haven't understood. The size of `struct segdesc_s` is 8 bytes. And first 2 fields in the structure are 2 bytes each. Then how would the 6 bytes (2 + 4) fit into this. Maybe I've completely misunderstood it. Thanks for your patience! :) – Suvrat Apte Feb 06 '18 at 16:18
1

@SuvratApte SGDT doesn't return the GDT itself. It returns a 6 byte structure that contains a 16-bit word that represent the length of the GDT (minus 1) followed by a 32-bit address which is the base of the GDT. This 6 byte structure is what is stored in the GDTR (GDT register). You can see a picture of the GDTR here: https://wiki.osdev.org/Global_Descriptor_Table – Michael Petch Feb 06 '18 at 16:20
@SuvratApte: `sgdt` doesn't store a segment-descriptor struct; it stores a length+pointer to the GDT itself. Read the link in my comment. – Peter Cordes Feb 06 '18 at 16:21
@PeterCordes, MichaelPetch okay thank you so much! I think I need to read a bit more about this. I will post my questions in the comment if I don't understand even after reading. Thanks for your prompt help! :) – Suvrat Apte Feb 06 '18 at 16:26
1

@PeterCordes: the need to avoid `fseg rep movs` is for two reasons: 1) early 80386 chips had a bug when an instruction had two prefixes, so as late as 1993 programmers often avoided such construction, and this code was written in 1992; 2) this code is written for 8088 PC, and they do not have FS segment. – AntoineL Mar 12 '18 at 14:52
@SuvratApte: you need to also factor that since that code is run in 16-bit mode, the absolute address stored by SGDT is restricted to be in the lower 16M, or if you prefer 24 bits. So this code abuses the design of the `struct segdesc_s` structure, and should be read as if the address loaded in ESI was only 24 bits, which fit in `base_low` and `base_middle`. – AntoineL Mar 12 '18 at 15:08
@AntoineL: This code can't run on 8088, it uses 32-bit registers, thus FS is available. So your #1 point is the real reason, thanks for that historical tidbit! Modifying `ds` and using `rep movs` with no prefixes would appear to have been an another option, but maybe not dramatically faster or smaller code-size by enough to be worth caring about in this one case. – Peter Cordes Mar 12 '18 at 22:36
1

@PeterCordes: First, my mistake, this code was initially written for 80286 PC. The fact it apparently uses 32-bit registers is because Minix 3 dropped the 16-bit protected mode; but if you look at Minix 2, the same code is used for both 16-bit and 32-bit. It was important to keep such critical code (debugging it is hard, and was much harder when we did not have virtual machines) simple, and when it worked, there were no compelling reason to change it; and as you noted earlier, performances would certainly not be a possible reason to risk a break! – AntoineL Mar 13 '18 at 08:35
@AntoineL: Ah right, 16-bit code ported to use 32-bit registers but changed minimally. That makes sense. (And simplicity / being hard to debug probably explains why they didn't swap `ds` and `es` and use `rep movs`. Back then, BOCHS didn't exist in its current state, and simulating a full PC would be pretty resource-intensive for computers back then anyway.) – Peter Cordes Mar 13 '18 at 08:43
This is running off-topic, just note that Tannenbaum introduced X86 simulators before, but indeed debugging protected mode and the operations around the switch were not part of the feature set before VMware and Bochs AFAIK. – AntoineL Mar 13 '18 at 08:56

How is the global descriptor table copied in MINIX assembly code (x86)

1 Answers1