4

Background: I am using x86's GS register to store the base linear address of (more precisely, the segment selector to) the per-CPU data area (much like the trick used in Linux). There is a need to load the pointers (linear addresses) of the per-CPU variables inside this area. However, the instruction LEA cannot deal with the GS, since, by definition, an "effective address" is just the "offset" part of the segmented linear address. I have noticed that Linux stores the base address of the per-CPU data area as a per-CPU variable to get around this problem, but this approach requires a load to get the pointer.

My question: Is there a way to use one instruction without loading from the memory to get the linear address (just like what LEA does for the effective address)? As far as I know, the base linear address of the segment is cached in the "hidden part" of the segment register, so I believe technically this is possible, but I find no answer anywhere.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 3
    In 64 bit mode there exists an optional `RDGSBASE` instruction which your cpu might support. The base address is also mapped to a MSR. – Jester Aug 30 '22 at 09:56
  • 2
    The descriptors cache is not architecturally visible, so it can't be taken for granted that it's possible to read them with architectural code (it surely is with uops code). As Jester pointed out `rdgsbase` and theMSR `IA32_GS_BASE` (0xc0000101) can indeed be used to read the `gs` base. I would do some profiling to see if `rdgsbase` really improves anything. Reading from an MSR is probably microcoded so it should be worse than hitting in L1 (or even L2). – Margaret Bloom Aug 30 '22 at 10:17
  • I am kind of confused though. Why would you need to LEA GS? If you store something in your per-CPU data area you will nonetheless need a dereference to access it. What would be the purpose of LEAing the base of the per-CPU area? You would still need something like `mov reg, gs:OFF` to load the per-CPU data. – Marco Bonelli Aug 30 '22 at 14:04
  • I have considered the MSR solution, and yes, I have come to the conclusion that the potential speedup (if not slowdown) does not worth the mess as mentioned by @MargaretBloom. – Wenyang Luo Aug 30 '22 at 16:02
  • @MarcoBonelli Well, the intention is convenience, rather than speed. Suppose we have implemented the APIs for some complicated objects based on object pointers. Now we decide to create per-CPU versions of these objects. It would be convenient to reuse the APIs with minimal effort. Otherwise in many cases, not only the APIs are subject to change, but also the primitives used in the implementation of the APIs. – Wenyang Luo Aug 30 '22 at 16:07

0 Answers0