Questions tagged [cpu-architecture]

The hardware microarchitecture (x86, x86_64, ARM, ...) of a CPU or microcontroller.

The hardware architecture and ISA (x86, x86_64, ARM, ...) and the micro-architectural implementation of a CPU or microcontroller.

Some of the key architecture:

  • arm - 32-bit Advanced RISC Machine.
  • arm64 - 64-bit Advanced RISC Machine.
  • ia32 - 32-bit Intel Architecture.
  • mips - 32-bit Microprocessor.
  • mipsel - 64-bit Microprocessor.
  • ppc - PowerPC Architecture.
  • ppc64 - 64-bit PowerPC Architecture.

Use this tag for questions regarding features, bugs and details concerning the inner working of specific CPU architectures.

3996 questions
64
votes
4 answers

Difference between x86, x32, and x64 architectures?

Please explain the difference between x86, x32 and x64? Its a bit confusing when it comes to x86 and x32 because most of the time 32-bit programs run on x86...
getjish
  • 817
  • 1
  • 7
  • 6
62
votes
10 answers

Maximum memory which malloc can allocate

I was trying to figure out how much memory I can malloc to maximum extent on my machine (1 Gb RAM 160 Gb HD Windows platform). I read that the maximum memory malloc can allocate is limited to physical memory (on heap). Also when a program exceeds…
Vikas
  • 1,422
  • 2
  • 12
  • 16
62
votes
4 answers

Micro fusion and addressing modes

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing addps xmm1, xmmword ptr [rsi+rax*1] does not micro-fuse according to IACA. However, if I use…
Z boson
  • 32,619
  • 11
  • 123
  • 226
61
votes
6 answers

Determine target ISA extensions of binary file in Linux (library or executable)

We have an issue related to a Java application running under a (rather old) FC3 on an Advantech POS board with a Via C3 processor. The java application has several compiled shared libs that are accessed via JNI. Via C3 processor is supposed to be…
60
votes
2 answers

FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2

I'm confused on how many flops per cycle per core can be done with Sandy-Bridge and Haswell. As I understand it with SSE it should be 4 flops per cycle per core for SSE and 8 flops per cycle per core for AVX/AVX2. This seems to be verified…
user2088790
58
votes
12 answers

How can I determine for which platform an executable is compiled?

I have a need to work with Windows executables which are made for x86, x64, and IA64. I'd like to programmatically figure out the platform by examining the files themselves. My target language is PowerShell but a C# example will do. Failing either…
halr9000
  • 9,879
  • 5
  • 33
  • 34
56
votes
5 answers

How do SMP cores, processes, and threads work together exactly?

On a single core CPU, each process runs in the OS, and the CPU jumps around from one process to another to best utilize itself. A process can have many threads, in which case the CPU runs through these threads when it is running on the respective…
56
votes
9 answers

What are some examples of non-Von Neumann architectures?

If I understand correctly modern computers are modeled after the Von Neumann architecture. I have sometimes seen reference to alternatives, but haven't really seen any very good descriptions of how non-Von Neumann architectures would be organised…
Steve
  • 1,849
  • 2
  • 19
  • 19
54
votes
12 answers

How does an assembly instruction turn into voltage changes on the CPU?

I've been working in C and CPython for the past 3 - 5 years. Consider that my base of knowledge here. If I were to use an assembly instruction such as MOV AL, 61h to a processor that supported it, what exactly is inside the processor that interprets…
user407896
  • 950
  • 1
  • 8
  • 12
53
votes
4 answers

How does direct mapped cache work?

I am taking a System Architecture course and I have trouble understanding how a direct mapped cache works. I have looked in several places and they explain it in a different manner which gets me even more confused. What I cannot understand is what…
Percentage
  • 700
  • 2
  • 6
  • 9
53
votes
2 answers

Why is division more expensive than multiplication?

I am not really trying to optimize anything, but I remember hearing this from programmers all the time, that I took it as a truth. After all they are supposed to know this stuff. But I wonder why is division actually slower than multiplication?…
Joan Venge
  • 315,713
  • 212
  • 479
  • 689
52
votes
7 answers

Where is the L1 memory cache of Intel x86 processors documented?

I am trying to profile and optimize algorithms and I would like to understand the specific impact of the caches on various processors. For recent Intel x86 processors (e.g. Q9300), it is very hard to find detailed information about cache structure.…
Brent Bradburn
  • 51,587
  • 17
  • 154
  • 173
52
votes
4 answers

How does x86 pause instruction work in spinlock *and* can it be used in other scenarios?

The pause instruction is commonly used in the loop of testing spinlock, when some other thread owns the spinlock, to mitigate the tight loop. It's said that it is equivalent to some NOP instructions. Could somebody tell me how exactly it works for…
Infinite
  • 3,198
  • 4
  • 27
  • 36
52
votes
2 answers

Program Counter and Instruction Register

Program counter holds the address of the instruction that should be executed next, while instruction register holds the actual instruction to be executed. wouldn't one of them be enough? And what is the length of each one of these registers?…
Benyamin Noori
  • 860
  • 1
  • 8
  • 24
51
votes
2 answers

Is x86 RISC or CISC?

According to Wikipedia, x86 is a CISC design, but I also have heard/read that it is RISC. What is correct? I'd to also like to know why it is CISC or RISC. What determines if a design is RISC or CISC? Is it just the number of machine language…
wowpatrick
  • 5,082
  • 15
  • 55
  • 86