6

I have read An Introduction to the Intel® QuickPath Interconnect. The document does not mention that QPI is used by processors to access memory. So I think that processors don't access memory through QPI.

Is my understanding correct?

Jingguo Yao
  • 7,320
  • 6
  • 50
  • 63

2 Answers2

10

Intel QuickPath Interconnect (QPI) is not wired to the DRAM DIMMs and as such is not used to access the memory that connected to the CPU integrated memory controller (iMC).
In the paper you linked this picture is present

Intel Socket connection, with QPI connections separated from memory lines

That shows the connections of a processor, with the QPI signals pictured separately from the memory interface.

A text just before the picture confirm that QPI is not used to access memory

The processor also typically has one or more integrated memory controllers. Based on the level of scalability supported in the processor, it may include an integrated crossbar router and more than one Intel® QuickPath Interconnect port.

Furthermore, if you look at a typical datasheet you'll see that the CPU pins for accessing the DIMMs are not the ones used by QPI.


The QPI is however used to access the uncore, the part of the processor that contains the memory controller.

QPI to access the DRAM controller, from Wikipedia Courtesy of QPI article on Wikipedia

QPI is a fast internal general purpose bus, in addition to giving access to the uncore of the CPU it gives access to other CPUs' uncore. Due to this link, every resource available in the uncore can potentially be accessed with QPI, including the iMC of a remote CPU.

QPI define a protocol with multiple message classes, two of them are used to read memory using another CPU iMC.
The flow use a stack similar to the usual network stack.

Thus the path to remote memory include a QPI segment but the path to local memory doesn't.

Update

For Xeon E7 v3-18C CPU (designed for multi-socket systems), the Home agent doesn't access the DIMMS directly instead it uses an Intel SMI2 link to access the Intel C102/C104 Scalable Memory Buffer that in turn accesses the DIMMS.

The SMI2 link is faster than the DDR3 and the memory controller implements reliability or interleaving with the DIMMS.

Xeon E7 v3 18C with SMI2 links


Initially the CPU used a FSB to access the North bridge, this one had the memory controller and was linked to the South bridge (ICH - IO Controller Hub in Intel terminology) through DMI.

Later the FSB was replaced by QPI.

Then the memory controller was moved into the CPU (using its own bus to access memory and QPI to communicate with the CPU).

Later, the North bridge (IOH - IO Hub in Intel terminology) was integrated into the CPU and was used to access the PCH (that now replaces the south bridge) and PCIe was used to access fast devices (like the external graphic controller).

Recently the PCH has been integrated into the CPU as well that now exposes only PCIe, DIMMs pins, SATAexpress and any other common internal bus.


As a rule of thumb the buses used by the processors are:

  • To other CPUs - QPI
  • To IOH - QPI (if IOH present)
  • To the uncore - QPI
  • To DIMMs - Pins as the DRAM technology (DDR3, DDR4, ...) support mandates. For Xeon v2+ Intel uses a fast SMI(2) link to connect to an off-core memory controller (Intel C102/104) that handle the DIMMS and channels based on two configurations.
  • To PCH - DMI
  • To devices - PCIe, SATAexpress, I2C, and so on.
Community
  • 1
  • 1
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • 1
    That covers the single-socket case, but for a multi-socket CPU, the DRAM is typically partitioned among the sockets, with any access to non-local memory going over QPI to the home socket for the dram and the response coming back over API as well. So in this case QPI is definitely on the path to RAM (although it's certain not the whole path - the last mile so to speak is just a memory bus like the local case). – BeeOnRope Jan 10 '17 at 14:50
  • 1
    @BeeOnRope So, for, say, a dual socket (A, B) system, the socket A has QPI to socket B and "DRAM hub" or just to socket B uncore (that in turn offers access to B local DRAM)? Simply put, there is a local memory for A and B and a non-local memory or just A and B local memory? – Margaret Bloom Jan 10 '17 at 15:58
  • 1
    @BeeOnRope All the NUMA documents I have read (not much I confess) define *remote memory* as the one connected to other CPU. [This diploma thesis](https://os.inf.tu-dresden.de/papers_ps/danielmueller-diplom.pdf) linked by Intel suggests that Intel CPUs access remote memory through other CPUs, not directly. This is what is stated in my answer, QPI connects (nowadays) to other CPUs (and that's only possible in the multi-socket case). I think that's how Intel NUMA works, what do you think? – Margaret Bloom Jan 10 '17 at 16:08
  • 1
    For your A,B question there would be a QPI link between socket A and socket B, and no separate links to the "DRAM hub" or anything. That is there is just a local memory for A (which is non-local for B) and vice-versa, and no memory which is non-local to both. – BeeOnRope Jan 10 '17 at 23:46
  • 1
    The QPI links are generally between sockets, and are probably best through of as connecting the uncore components of separate sockets. In fact, discussing what QPI is for _single socket_ systems is fraught with confusion - since you can argue that such systems don't really have QPI links per say (although some QPI concepts may be internally used in some of the internal interconnects). So QPI is _primarily_ designed to be an inter-socket interconnect (originally, an inter-CPU connection), and one of the _primary_ duties of this interconnect is to satisfy memory access. – BeeOnRope Jan 10 '17 at 23:47
  • 1
    Yes, your statement about how QPI connects is correct (indeed, the CPU includes the memory controller). Still, it is the bus used to access remote memory in multi-socket systems, and is pretty much designed with that in mind. Effectively it takes much of the role the FSB took in multi-CPU systems - all the "remote" access, while the bus between the memory controller and its local memory takes the rest. – BeeOnRope Jan 10 '17 at 23:59
  • @BeeOnRope Exactly, QPI connects to other CPUs and indirectly to their DRAM controllers. Simply put an hw manufacturer won't wire the QPI to the DIMMs sockets. Still the question maybe interpreted from a functional (and not hw) pov. – Margaret Bloom Jan 11 '17 at 05:54
  • No of course QPI isn't directly wired to the DIMM sockets. I don't think the OP is asking that though! It's simply "is QPI _used_ to access memory" and that's an unequivocal yes! It doesn't have to be directly connected (indeed often even things like the memory controllers aren't directly connected in the presence of buffering, etc). – BeeOnRope Jan 11 '17 at 15:07
  • @BeeOnRope Well even a PPP 56K serial line is used to access memory then :) Think of the internet. I think it's pointless to discuss where to draw the line, we both fundamentally agree on the technical level. I changed my answer to be less explicit as you rightfully argued. – Margaret Bloom Jan 12 '17 at 17:28
  • I think it's clear that the poster means "is this used to access the **host's** RAM". I can't see how a PPP 56K line is ever used to access a host's RAM except in convoluted scenarios. Futhermore, such access is optional - using QPI is _not_ (in hosts that have it between sockets). Imagine I asked you "is the memory controlled _used to access memory_?" - I think the answer is plainly yes - but of course it's just one element on the path to memory and may or may not directly be wired to the RAM, etc, etc. Still, requests to/from RAM obviously flow through it. – BeeOnRope Jan 12 '17 at 17:57
  • I'll also admit I didn't look up what a "PPP 56K" line is, but I made an educated guess :) – BeeOnRope Jan 12 '17 at 18:00
9

Yes, QPI is used to access all remote memory on multi-socket systems, and much of its design and performance is intended to support such access in a reasonable fashion (i.e., with latency and bandwidth not too much worse than local access).

Basically, most x86 multi-socket systems are lightly1 NUMA: every DRAM bank is attached to a the memory controller of a particular socket: this memory is then local memory for that socket, while the remaining memory (attached to some other socket) is remote memory. All access to remote memory goes over the QPI links, and on many systems2 that is fully half of all memory access and more.

So QPI is designed to be low latency and high bandwidth to make such access still perform well. Furthermore, aside from pure memory access, QPI is the link through which the cache coherence between sockets occurs, e.g., notifying the other socket of invalidations, lines which have transitioned into the shared state, etc.


1 That is, the NUMA factor is fairly low, typically less than 2 for latency and bandwidth.

2 E.g., with NUMA interleave mode on, and 4 sockets, 75% of your access is remote.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386