How much is known publicly about the details of how Apple processors work internally?

Question

Edit: in an attempt to avoid this question being closed as a reference request (though I still would appreciate references!), I will give a few general, non-link-only questions for concreteness. I would accept an answer for any of these, but the more the better.

Is the A12 in-order, or out-of-order?
How many instructions can it retire per cycle?
How many pipeline stages does it have?
What sort of cache hierarchy does it have?
Does it architecturally resemble modern Intel processors, and if not, what are the major differences?

Original question: There is a lot of publicly available documentation about how the current mainstream Intel core design works (Pentium Pro and all its descendants). Both Intel’s own optimization manuals, and descriptions published by WikiChip and Agner Fog.

Any curious person can learn what the pipeline stages are, what each part of the core does, and so on.

I can’t find anything similar for the Apple Ax series. Does it exist?

This is probably going to get closed as off-topic (asking for off-site resources), but upvoted anyway. Possibly it can be rephrased to directly ask for some perf details beyond the basics of how wide the pipeline is, but then it would be "too broad" because a sufficient answer would be as big as a chapter of Agner Fog's microarch guide. (Or be link-only). Anyway, IMO we should make an exception to the rule for this useful question. Hmm, maybe "how to micro-optimize for Apple's ARM CPUs?" IDK, being up-front about wanting broad details is better. — Peter Cordes, May 31 '19 at 04:49
@PeterCordes By the way, off-topic but since I have you here... do you know of any good resources for x86 other than the three I mentioned? (Agner, Intel, WikiChip). There is a quite detailed chapter on Intel in "Modern Processor Design: Fundamentals of Superscalar Processors"; not sure what else exists. — Brennan Vincent, May 31 '19 at 05:03
https://stackoverflow.com/tags/x86/info has most of the links I know of (including some of my SO answers, e.g. the Haswell partial-register renaming Q&A that isn't well documented by Intel, and Agner is wrong about it). But if you mean *how* it's designed, rather than just how it performs, Intel has published patents on a lot of things that go into their chips. (@Hadi Brais and @BeeOnRope have dug up a few patents that shed some light on various things in various SO answers and comments). And Intel presents thing at conferences, another source of microarch info. — Peter Cordes, May 31 '19 at 06:05
Also IDF talks have some neat stuff; e.g. the Skylake power-management talk at IDF2015 went into a lot of detail about hardware P-states. Worth watching the slides + audio if you haven't. — Peter Cordes, May 31 '19 at 06:06
@PeterCordes yes I am interested in this stuff primarily because of intellectual curiosity and only secondarily because I want to write performant code. So I'm much more interested in "how it's designed" than "how it performs". (I *do* occasionally have to write performant code, but it's usually nothing more complicated than "make sure you don't have a DIV instruction as a loop-carried dependency"). Thanks for the pointers; I'll check them out. — Brennan Vincent, May 31 '19 at 06:09
Sometimes we can extrapolate from performance (esp when we have performance counters) to infer things about the design. e.g. [Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures?](//stackoverflow.com/q/45766444) — Peter Cordes, May 31 '19 at 06:10
IIRC Apple processors are modified ARMs with integrated chips (a PoP). Presumably Apple bought a specific model and integrated their specific units without modifying the pipeline too much (eventually for accounting for custom instructions) but they could very well have. [The A11 has 6 cores (2 Monsoon and 4 Mistral)](http://phonedb.net/index.php?m=processor&id=718&c=apple_a11_bionic_apl1072__apl1w72__t8015) and there's something about these uarchs on Google. — Margaret Bloom, May 31 '19 at 14:37
Some reverse-engineered details of M1: https://drive.google.com/file/d/1WrMYCZMnhsGP4o3H33ioAUKL_bjuJSPt/view - I think by Maynard Handley, frequent poster on https://realworldtech.com among other things. https://www.tomshardware.com/news/apple-m1-crowd-sourced-reverse-engineering-doc-published — Peter Cordes, Feb 11 '22 at 22:30

Olsonist · Answer 1 · 2020-04-19T13:52:57.513

Apple is an ARM architectural licensee and they have developed several generations of ARM64 chips. A resource for some of the micro-architectural detail on their chips is the Cyclone LLVM scheduler model analyzed here. This is upstreamed into LLVM and also released by Apple as open source. I think the Cyclone model covers all their chips.

Other resources are WikiChip and Wikipedia which aggregate information and cite sources. The Apple patent file provides other information. Benchmarks and reviews are available but not at the level of Agner.

First, Wikipedia says the A12 is OOO but a Big Little chip. Big (Vortex) on the A12 decodes 7-wide and Little (Tempest) is 3-Wide with 13 and 5 execution ports respectively. I can't find retire rates.

How much is known publicly about the details of how Apple processors work internally?

1 Answers1