2

I am learning to program a system core of i386 by watching some videos. I've known some procedures about entering protected mode:

In a .code16 file, first I need to open A20 Address Line and changed CR0 register, and then I need to ljmp into a .code32 code.

Now I am wandering the differences between .code16 machine code and .code32 machine code

These are my questions:

  1. Is it valid to use .code16 code in protect mode?
  2. What's the difference between .code16 machine code and .code32 machine code genreated by assembler
  3. I found it is valid to execute .code16 code after I set CR0 register and before ljmp, that's why?
  4. My teacher told me .code16 means "generate code specified to 16-bit mode" and .code32 means "generage code specified for 32-bit mode", so what does that mean?

Sorry for my ignorance, I am a green hand in this field

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Markity
  • 193
  • 8
  • 2
    Look into the operand size prefix to see one way a 16-bit segment differs from a 32-bit one. Setting `cr0` changes how the *new* values put in `cs` will be interpreted. You need a far jump to "apply" the changes. – Margaret Bloom Nov 28 '22 at 17:23
  • `.code16` and `.code32` just tell the assembler what mode to assemble for (https://en.wikipedia.org/wiki/X86-64#Operating_modes), i.e. what they should assume the default address-size and operand-size are when choosing whether to use a `66h` operand-size or `67h` address-size prefix or not. – Peter Cordes Nov 28 '22 at 21:10
  • 1
    It's up to you to only put `.code16` code where it will be executed with the CPU in 16-bit mode, and same for `.code32`. Unless you hand-craft machine code that intentionally decodes the same or differently depending on mode, as in [Determine your language's version](https://codegolf.stackexchange.com/a/139717) where the same machine code returns with AL=16, 32, or 64 depending on what mode the CPU was in when called. Unintentionally running 16-bit machine code with the CPU in 32-bit mode or vice versa doesn't go well, especially for memory addressing modes and immediates. – Peter Cordes Nov 28 '22 at 21:12
  • 1
    You can use `.code16` in protected mode if you use a 16 bit code segment. Not a common sight though. – fuz Nov 29 '22 at 01:07

1 Answers1

3

What's the difference between .code16 machine code and .code32 machine code generated by assembler?

In 16-bit modes (real mode and 16-bit protected mode) and in 32-bit protected mode, the CPU interprets the bytes of the code differently.

The main difference is that the meanings of the instruction prefixes 66 and 67 (hexadecimal) are reversed:

In 16-bit modes, the CPU uses 16-bit registers and constants and i8086-type addressing modes by default. The prefix 66 tells the CPU to use 32-bit registers and constants; the prefix 67 tells the CPU to use i80386-type addressing modes:

Program bytes   Instruction understood by the CPU
      8b 08     mov cx,[bx+si]
66    8b 08     mov ecx,[bx+si]
67    8b 08     mov cx,[eax]
66 67 8b 08     mov ecx,[eax]

In 32-bit protected mode, it is the other way round:

      8b 08     mov ecx,[eax]
...
66 67 8b 08     mov cx,[bx+si]

"generate code specified to 16/32-bit mode" ... so what does that mean?

If one line of your program is mov ecx,[eax], the assembler writes 8b 08 in .code32 mode and 66 67 8b 08 in .code16 mode.

... because the CPU interprets 8b 08 as mov ecx,[eax] when operating in 32-bit mode and it interprets 66 67 8b 08 as mov ecx,[eax] when operating in 16-bit mode.

Is it valid to use .code16 code in protect mode?

I have already written about the "16-bit protected mode".

Actually, there exists no "16-bit protected mode" but only one single "protected mode". In protected mode, you can create 16- and 32-bit descriptors in the GDT (or LDT).

To execute 16-bit code in protected mode, you must create a 16-bit code descriptor (in the GDT or the LDT) and perform an ljmp to that code.

(Executing 16-bit code in protected mode is required to switch a 32-bit CPU from protected mode back to real mode.)

Note that the descriptors for 16-bit code (and the stack!) must only have a size of 64 KiB and less. This means that you cannot create one single descriptor describing the whole 4 GiB of memory (as it is done for 32-bit code) but it might be necessary to create multiple descriptors for code that is located in different parts of the memory.

I found it is valid to execute .code16 code after I set CR0 register and before ljmp, that's why?

Internally, the segment registers (cs, ds ...) seem to be about 80 bits long but only 16 of these 80 bits are visible to the programmer.

One of the "hidden" bits of the cs register specifies if the CPU executes 16- or 32-bit code. (In protected mode, this bit is read from the GDT or LDT.)

According to some information I have read when reading about the so-called "unreal mode", the main difference between "real mode" and "protected mode" inside the i80386 CPU seems to be that the "hidden" bits of the segment registers are modified differently in the two modes when changing the value of a segment register. (There are also differences in interrupt handling etc. ...)

If this is true, setting or clearing bit 0 of CR0 has (nearly) no effect at all until a segment register is changed (by performing ljmp, mov ds,ax ... or an interrupt).

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
  • 2
    "internally always seems to operate in the same mode" is a bit confused. I agree about the `cr0` effect, but that is exactly what I would understand as "operating in a different mode". I think the similarities/differences of PM and R86M should rather be listed as follows: Loading segregs loads only the base, according to Real Address 8086 Mode; all segregs are by default set up to have a 64 KiB limit and R/W/X access rights (even for `cs`), and interrupts and exceptions go through the 86M IVT. (If you access an `a32` address or the word at offset 0FFFFh, an R86M interrupt 0Ch/0Dh is issued.) – ecm Nov 29 '22 at 08:56
  • 1
    Just a note that the Intel documentation does make reference to 16-bit and 32-bit protected mode. – Michael Petch Nov 29 '22 at 09:07
  • 2
    "32-bit protected mode" is a handy way to refer to being in protected mode, with CS loaded from a 32-bit code segment. That's true whether you look at 16 and 32-bit as sub-modes of protected mode, or states that you don't call a "mode", instead needing some other term like "default operand-size". (Except "32-bit default operand-size" doesn't distinguish "64-bit mode" from 32-bit legacy protected mode or 32-bit compat mode. https://en.wikipedia.org/wiki/X86-64#Operating_modes) – Peter Cordes Nov 29 '22 at 09:12
  • Re: unreal mode: https://wiki.osdev.org/Unreal_Mode / https://en.wikipedia.org/wiki/Unreal_mode. Switch to protected mode, set DS/ES and maybe SS segment limits to unlimited, then switch back to real mode. IIRC, setting those segment registers in real mode will just update the segment base, without resetting the limit to 64K. So yeah, always operating like protected mode in that sense, with real mode being obtained by having the limits set to 64K. I'm not sure how privilege levels or any other GDT fields interact, fortunately never had to care about it. – Peter Cordes Nov 29 '22 at 09:14
  • 1
    @MartinRosenau I agree, the limit and base are set up in one or another way and then the segmentation mechanism is carried out the same way regardless what mode you're in, utilising whatever limit and base were set up. – ecm Nov 29 '22 at 09:24
  • @ecm I tried to re-word the corresponding paragraph. – Martin Rosenau Nov 29 '22 at 09:28
  • @MichaelPetch That's true. However, when I am talking about "16-bit protected mode" in the context of this question, I actually mean: "A 16-bit code segment in the mode that is called '32-bit protected mode' in Intel's documentation." – Martin Rosenau Nov 29 '22 at 09:34
  • good, I basically understand – Markity Dec 03 '22 at 14:49