1
# Load the GDT.
mov $gdt_descriptor, %ecx
lgdt (%ecx)
mov $0x10, %cx
mov %cx, %ds
mov %cx, %es
mov %cx, %fs
mov %cx, %gs
mov %cx, %ss
ljmp $0x8, $1f
1:  mov $kernel_stack, %esp

I am unable to understand what this code does. Why mov $0x10 to cx and then subsequently to other registers after loading GDT? And what does ljmp instruction do?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Junping Qv
  • 21
  • 4
  • 2
    Look in http://osdev.org/ – Basile Starynkevitch Jan 13 '18 at 07:47
  • It loads the global descriptor table and then loads the segment registers with a data selector (probably 32-bit data r/w flat 4gb segment). The LJMP is used to set the Code Segment (CS). CS can't be loaded directly with `mov` instruction. The `1f` is the address of the label `:`. So the JMP effectively jumps to the next instruction but sets CS at the same time. – Michael Petch Jan 13 '18 at 08:16

1 Answers1

3

It loads the segment descriptor caches (inside the CPU) from the new GDT which the lgdt told the CPU about.

The segment descriptions inside the CPU don't update automatically when you change table entries, or change where the table points.

You can even switch back to real mode with DS base=0 limit=4GiB (and same for ES and SS), and use 32-bit addresses in real mode until the next mov ds, r16 or pop ds instruction overwrites the cached segment description. (This is called big / huge unreal mode, huge if you do it for CS as well, but that's less convenient because interrupts in real mode only save IP, not EIP.)

ljmp is a far jmp, which sets CS (in this case to use a different descriptor than the data descriptors). x86 doesn't allow mov or pop to set CS, only far jump. Presumably the CPU isn't changing modes with this jump, otherwise the asm source would need to use a .code32 or .code16 directive.

The target is the 1: label, in the forward direction. So the mov to %esp is decoded/run with whatever code-segment settings were in GDT index 1. (The low 3 bits of segment selectors are permission bits, so $8 is GDT index 1, and $0x10 is GDT index 2.)

It's a bit weird to separate the mov to %ss from the instruction that sets %esp, because x86 automatically defers interrupts until the instruction after a mov to SS. This lets you atomically set SS:SP without using cli/sti, but probably this code runs with interrupts disabled already. This code probably only runs once during bootup, so it makes sense to just disable interrupts for as long as necessary to set up a new GDT and IDT.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    It is very likely CLI was issue well before this since it is likely the iDT hasn't been set up. Of course this is just a guess, but it is very likely the case here. – Michael Petch Jan 13 '18 at 08:33
  • Minor nitpick/trivia: The 8086/8 (not the CMOS versions) did support both `mov cs, r/m16` and `pop cs`. With the 186+ the `mov` became silently dropped (see [here](https://www.vogons.org/viewtopic.php?f=9&t=46108)). Since the 286 the first #UDs and the second is the escape byte (286 introduced PM). In the manuals of that time, the 8086/80186 ones didn't forbid the use of `mov` and `pop` to CS but the latter didn't include the move in the transfer instructions table (the former didn't have such table). The 286+ forbad the pop. – Margaret Bloom Jan 13 '18 at 11:26
  • @MargaretBloom: Amusing historical fact, thanks. That explains why there's room in the instruction encoding for `pop` and `mov` to CS. I'm still happy with my answer saying "x86 doesn't support" (in general), even though a couple from last century do. Definitely none that could run this code (because of the 32-bit operand size) – Peter Cordes Jan 13 '18 at 17:01