Questions tagged [cpu-architecture]

The hardware microarchitecture (x86, x86_64, ARM, ...) of a CPU or microcontroller.

The hardware architecture and ISA (x86, x86_64, ARM, ...) and the micro-architectural implementation of a CPU or microcontroller.

Some of the key architecture:

  • arm - 32-bit Advanced RISC Machine.
  • arm64 - 64-bit Advanced RISC Machine.
  • ia32 - 32-bit Intel Architecture.
  • mips - 32-bit Microprocessor.
  • mipsel - 64-bit Microprocessor.
  • ppc - PowerPC Architecture.
  • ppc64 - 64-bit PowerPC Architecture.

Use this tag for questions regarding features, bugs and details concerning the inner working of specific CPU architectures.

3996 questions
149
votes
1 answer

Bubble sort slower with -O3 than -O2 with GCC

I made a bubble sort implementation in C, and was testing its performance when I noticed that the -O3 flag made it run even slower than no flags at all! Meanwhile -O2 was making it run a lot faster as expected. Without optimisations: time ./sort…
anon
  • 1,269
  • 2
  • 4
  • 6
135
votes
6 answers

What is the "FS"/"GS" register intended for?

So I know what the following registers and their uses are supposed to be: CS = Code Segment (used for IP) DS = Data Segment (used for MOV) ES = Destination Segment (used for MOVS, etc.) SS = Stack Segment (used for SP) But what are the following…
user541686
  • 205,094
  • 128
  • 528
  • 886
134
votes
10 answers

Why do x86-64 systems have only a 48 bit virtual address space?

In a book I read the following: 32-bit processors have 2^32 possible addresses, while current 64-bit processors have a 48-bit address space My expectation was that if it's a 64-bit processor, the address space should also be 2^64. So I was…
er4z0r
  • 4,711
  • 8
  • 42
  • 62
127
votes
14 answers

What's the difference between a word and byte?

I've done some research. A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits? I asked a prof this…
user796388
126
votes
11 answers

Floating point vs integer calculations on modern hardware

I am doing some performance critical work in C++, and we are currently using integer calculations for problems that are inherently floating point because "its faster". This causes a whole lot of annoying problems and adds a lot of annoying…
maxpenguin
  • 5,039
  • 6
  • 28
  • 22
120
votes
16 answers

Are there any smart cases of runtime code modification?

Can you think of any legitimate (smart) uses for runtime code modification (program modifying it's own code at runtime)? Modern operating systems seem to frown upon programs that do this since this technique has been used by viruses to avoid…
113
votes
7 answers

Detecting CPU architecture compile-time

What is the most reliable way to find out CPU architecture when compiling C or C++ code? As far as I can tell, different compilers have their own set of non-standard preprocessor definitions (_M_X86 in MSVS, __i386__, __arm__ in GCC, etc). Is there…
Alex B
  • 82,554
  • 44
  • 203
  • 280
112
votes
10 answers

Why is x86 ugly? Why is it considered inferior when compared to others?

I've been reading some SO archives and encountered statements against the x86 architecture. Why do we need different CPU architecture for server & mini/mainframe & mixed-core? says "PC architecture is a mess, any OS developer would tell you…
claws
  • 52,236
  • 58
  • 146
  • 195
107
votes
3 answers

Why is the JVM stack-based and the Dalvik VM register-based?

I'm curious, why did Sun decide to make the JVM stack-based and Google decide to make the DalvikVM register-based? I suppose the JVM can't really assume that a certain number of registers are available on the target platform, since it is supposed to…
aioobe
  • 413,195
  • 112
  • 811
  • 826
107
votes
3 answers

atomic operation cost

What is the cost of the atomic operation (any of compare-and-swap or atomic add/decrement)? How much cycles does it consume? Will it pause other processors on SMP or NUMA, or will it block memory accesses? Will it flush reorder buffer in…
osgx
  • 90,338
  • 53
  • 357
  • 513
103
votes
5 answers

Why is a conditional move not vulnerable to Branch Prediction Failure?

After reading this post (answer on StackOverflow) (at the optimization section), I was wondering why conditional moves are not vulnerable for Branch Prediction Failure. I found on an article on cond moves here (PDF by AMD). Also there, they claim…
101
votes
7 answers

Why does Intel hide internal RISC core in their processors?

Starting with Pentium Pro (P6 microarchitecture), Intel redesigned it's microprocessors and used internal RISC core under the old CISC instructions. Since Pentium Pro all CISC instructions are divided into smaller parts (uops) and then executed by…
Goofy
  • 5,187
  • 5
  • 40
  • 56
99
votes
6 answers

Enhanced REP MOVSB for memcpy

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy. ERMSB was introduced with the Ivy Bridge microarchitecture. See the section "Enhanced REP MOVSB and STOSB operation (ERMSB)" in the Intel optimization manual…
Z boson
  • 32,619
  • 11
  • 123
  • 226
98
votes
9 answers

System where 1 byte != 8 bit?

All the time I read sentences like don't rely on 1 byte being 8 bit in size use CHAR_BIT instead of 8 as a constant to convert between bits and bytes et cetera. What real life systems are there today, where this holds true? (I'm not sure if there…
Xeo
  • 129,499
  • 52
  • 291
  • 397
98
votes
4 answers

What are stalled-cycles-frontend and stalled-cycles-backend in 'perf stat' result?

Does anybody know what is the meaning of stalled-cycles-frontend and stalled-cycles-backend in perf stat result ? I searched on the internet but did not find the answer. Thanks $ sudo perf stat ls Performance counter stats for…
Dafan
  • 1,186
  • 1
  • 11
  • 14