8

I am studying linux kernel source (old version 0.11v). When I checked about fork system call, there is some asm code for context switching like this:

/*
 * switch_to(n) should switch tasks to task nr n, first
 * checking that n isn't the current task, in which case it does nothing.
 * This also clears the TS-flag if the task we switched to has used
 * tha math co-processor latest.
 */
#define switch_to(n) {\
struct {long a,b;} __tmp; \
__asm__("cmpl %%ecx,current\n\t" \
    "je 1f\n\t" \
    "movw %%dx,%1\n\t" \
    "xchgl %%ecx,current\n\t" \
    "ljmp *%0\n\t" \
    "cmpl %%ecx,last_task_used_math\n\t" \
    "jne 1f\n\t" \
    "clts\n" \
    "1:" \
    ::"m" (*&__tmp.a),"m" (*&__tmp.b), \
    "d" (_TSS(n)),"c" ((long) task[n])); \
}

I guess that the "ljmp %0\n\t" will work for changing TSS and LDT. I know that the ljmp instruction needs two parameters, like ljmp $section, $offset. I think the ljmp instruction has to use _TSS(n), xx. We don't need to provide a meaningful offset value, because cpu will change cpu's register including eip for new task.

  1. I don't know how ljmp %0 works like ljmp $section, $offset and why this instruction uses %0. Is %0 just the address of __tmp.a?

  2. CPU might save the EIP register to the TSS for the old task when excuting the ljmp instruction. Am I right that the EIP value for old task is address of "cmpl %%ecx,_last_task_used_math\n\t"?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
bongsu
  • 185
  • 2
  • 9
  • 2
    This awful to read, and some comments by Linus might have been nice. `ljmp %0` will jump to the 48-bit address contained at memory address %0. So effectively it will `ljmp` to the address contained at memory address `__tmp`. You'll observe that `movw %%dx,%1` effectively initialize `__tmp.b` with value `_TSS(n)`. *_TSS(n)* will be a segment descriptor for a task gate. You'll notice `%0` (*__tmp.a*) is not initialized. It doesn't need to be since the offset (which *__tmp.a* represents) is ignored when you _ljmp_ though a task gate. Effectively you do an ljmp to segment:offset _TSS(n):garbage. – Michael Petch Nov 18 '15 at 16:47
  • ljmp _TSS(n):garbage when _TSS(n) represents a task gate selector will switch task based on the task selector, ignoring the offset(so it doesn't need to be set to anything), and continue execution on the instruction after ljmp in the new task context. – Michael Petch Nov 18 '15 at 16:52
  • `cmpl %%ecx,_last_task_used_math` will be executing in context of a new task after ljmp. I haven't looked at the old kernel source, but it seems that *_last_task_used_math* is the taskid of the last task that used a math instruction. If it is different from the current taskid then the `clts` instruction is avoided. – Michael Petch Nov 18 '15 at 17:07
  • Also note that the TSS task switching has been abandoned in linux, so it's only of historical interest. – Jester Nov 18 '15 at 17:08
  • Yes @Jester, I was only commenting on that old kernel, since that seems to be what the OP was interested in. Linux has come a long way since 0.11 ;) – Michael Petch Nov 18 '15 at 17:09
  • Yeah, and I have seen it's an old version, but maybe the OP doesn't know this particular approach has been totally abandoned. – Jester Nov 18 '15 at 17:10
  • @MichaelPetch As far as I can see, the source of Linux 0.11 does contain a comment before this macro definition. If so, OP should have copied it along with the code. Thanks for the task gate explanation, it's one of the things I never read up on and exactly the thing I couldn't explain. –  Nov 18 '15 at 17:56
  • @MichaelPetch thank you for your answer. but i don't understand how ljmp %0 will be ljmp TSS(n):garbage. – bongsu Nov 19 '15 at 12:24
  • @MichaelPetch I think I have to change code like ljmp %1:%0. – bongsu Nov 19 '15 at 12:30
  • @Rhymoid Sorry, I updated comment. – bongsu Nov 19 '15 at 12:34
  • I created a community Wiki to elaborate @MichaelPetch's informative comments. –  Nov 19 '15 at 13:53
  • `ljmp TSS(n):garbage` was a simplification, and I must apologize. The pointer itself is actually at _tmp_ . The form of the `ljmp` being used is indirect long jump where the one parameter is a pointer to a memory location that holds the pointer to jump to. What is interesting is that the assembler used(2 decades ago) was more lax about the syntax for an `ljmp`. To avoid confusion now if you want to denote an indirect far jmp through a pointer it would have looked like `ljmp *%0` . The asterisk says we are jumping to an address contained at a memory location (%0 is address of _tmp_) – Michael Petch Nov 19 '15 at 19:18
  • Using a modern day GNU assembler it will actually warn you with this _Warning: indirect ljmp without *_ . I think the idea of the warning is a good one. Although the assembler (on x86) can infer you mean an indirect long jump by virtue of one memory operand, it is suggesting that it is better form to add an asterisk when denoting indirect jumps which might have avoided confusion. – Michael Petch Nov 19 '15 at 19:26
  • @Rhymoid : As you can tell I probably didn't go looking at the kernel code in question (thanks for the heads up)- or I would have known Linus had some code comments. I think that in this case inline comments within the code could have been beneficial since it is unclear exactly what happens with the `ljmp` (when using a target address that uses a task descriptor) – Michael Petch Nov 19 '15 at 19:32
  • I would recommend this [book](http://oldlinux.org/download/ECLK-5.0.1-WithCover.pdf). – Li-Guangda Nov 22 '22 at 09:22

1 Answers1

4

What does this syntax even mean?

This unreadable mess is GCC's Extended ASM, which has a general format of

 asm [volatile] ( AssemblerTemplate
                : OutputOperands
              [ : InputOperands
              [ : Clobbers ] ] )

In this case, the __asm__ statement only contains an AssemblerTemplate and InputOperands. The input operands part explains what %0 and %1 mean, and how ecx and edx get their value:

  • The first input operand is "m" (*&__tmp.a), so %0 becomes the memory address of __tmp.a (to be perfectly honest, I'm not sure why *& is needed here).
  • The second input operand is "m" (*&__tmp.b), so %1 becomes the memory address of __tmp.b.
  • The third input operand is "d" (_TSS(n)), so the DX register will contain _TSS(n) when this code starts.
  • The fourth input operand is "c" ((long) task[n]), so the ECX register will contain task[n] when this code starts.

When cleaned up, the code can be interpreted as follows

    cmpl %ecx, _current
    je 1f

    movw %dx, __tmp.b          ;; the address of __tmp.b
    xchgl %ecx, _current
    ljmp __tmp.a               ;; the address of __tmp.a

    cmpl %ecx, _last_task_used_math
    jne 1f
    clts
1:

How can ljmp %0 even work?

Please note that there are two forms of the ljmp (also known as jmpf) instruction. The one you know (opcode EA) takes two immediate arguments: one for the segment, one for the offset. The one used here (opcode FF /5) is different: the segment and address arguments are not in the code stream, but are somewhere in memory, and the instruction points at the address.

In this case, the argument to ljmp points at the beginning to the __tmp structure. The first four bytes (__tmp.a) contain the offset, and the two bytes that follow (the lower half of __tmp.b) contain the segment.

This indirect ljmp __tmp.a would be equivalent to ljmp [__tmp.b]:[__tmp.a], except that ljmp segment:offset can only take immediate arguments. If you want to switch to an arbitrary TSS without self-modifying code (which would be an awful idea), the indirect instruction is the one to use.

Also note that __tmp.a is never initialised. We can assume that _TSS(n) refers to a task gate (because that's the way you do context switches with the TSS), and the offset for jumps "through" a task gate are ignored.

Where does the old instruction pointer go?

This piece of code doesn't store the old EIP in the TSS.

(I'm guessing after this point, but I think this guess is reasonable.)

The old EIP is stored on the kernel-space stack that corresponds with the old task.

Linux 0.11 allocates a ring 0 stack (i.e. a stack for the kernel) for each task (see the copy_process function in fork.c, which initialises the TSS). When an interrupt happens during task A, the old EIP is saved on the kernel-space stack rather than the user-space stack. If the kernel decides to switch to task B, the kernel-space stack is also switched. When the kernel eventually switches back to task A, this stack is switched back, and through an iret we can return to where we were in task A.

  • A welcome addition would be an explanation on how the math co-processor is related to the TS flag. –  Nov 19 '15 at 13:59
  • 1
    Thanks for formalizing my comments and putting them in a s a community wiki (kudos). Only reason I didn't do it was because I didn't go looking at the kernel code to confirm what was really going on. As for the coprocessor, when there is a task switch the TS flag gets set so that the next coprocessor operation throws an exception that can be caught. When the exception is caught a kernel can then save the coprocessor state on the stack. If the previous task and the current task are the same you clear it because there is no need to save the coprocessor state and to avoid throwing am exception – Michael Petch Nov 19 '15 at 17:35