Questions tagged [cpu-architecture]

The hardware microarchitecture (x86, x86_64, ARM, ...) of a CPU or microcontroller.

The hardware architecture and ISA (x86, x86_64, ARM, ...) and the micro-architectural implementation of a CPU or microcontroller.

Some of the key architecture:

  • arm - 32-bit Advanced RISC Machine.
  • arm64 - 64-bit Advanced RISC Machine.
  • ia32 - 32-bit Intel Architecture.
  • mips - 32-bit Microprocessor.
  • mipsel - 64-bit Microprocessor.
  • ppc - PowerPC Architecture.
  • ppc64 - 64-bit PowerPC Architecture.

Use this tag for questions regarding features, bugs and details concerning the inner working of specific CPU architectures.

3996 questions
50
votes
2 answers

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

This loop runs at one iteration per 3 cycles on Intel Conroe/Merom, bottlenecked on imul throughput as expected. But on Haswell/Skylake, it runs at one iteration per 11 cycles, apparently because setnz al has a dependency on the last imul. ;…
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
50
votes
3 answers

How are x86 uops scheduled, exactly?

Modern x86 CPUs break down the incoming instruction stream into micro-operations (uops1) and then schedule these uops out-of-order as their inputs become ready. While the basic idea is clear, I'd like to know the specific details of how ready…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
50
votes
1 answer

what is a store buffer?

can anyone explain what is load buffer and how it's different from invalidation queues. and also difference between store buffers and write combining buffers? The paper by Paul E Mckenny…
harish reddy
  • 561
  • 1
  • 6
  • 6
48
votes
2 answers

How to use Fused Multiply-Add (FMA) instructions with SSE/AVX

I have learned that some Intel/AMD CPUs can do simultanous multiply and add with SSE/AVX: FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2. I like to know how to do this best in code and I also want to know how it's done internally in the…
user2088790
47
votes
2 answers

Can x86's MOV really be "free"? Why can't I reproduce this at all?

I keep seeing people claim that the MOV instruction can be free in x86, because of register renaming. For the life of me, I can't verify this in a single test case. Every test case I try debunks it. For example, here's the code I'm compiling with…
user541686
  • 205,094
  • 128
  • 528
  • 886
47
votes
1 answer

Determining the CPU architecture of a static library (LIB) on Windows

I just built libpng on a 64-bit Windows machine using VS2008. It produces a libpng.lib file inside the \projects\visualc71\Win32_Lib_Release directory (Configuration used being "LIB Release"). I used dumpbin to inspect this LIB…
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
47
votes
1 answer

gcc optimization flag -O3 makes code slower than -O2

I find this topic Why is it faster to process a sorted array than an unsorted array? . And try to run this code. And I find strange behavior. If I compile this code with -O3 optimization flag it takes 2.98605 sec to run. If I compile with -O2 it…
Mike Minaev
  • 1,912
  • 4
  • 23
  • 33
47
votes
6 answers

Why is the page size of Linux (x86) 4 KB, how is that calculated?

The default memory page size of the Linux kernel on x86 architecture was 4 KB, I wonder how was that calculated, and why ?
daisy
  • 22,498
  • 29
  • 129
  • 265
45
votes
4 answers

RISC-V spec references the word 'hart' - what does 'hart' mean?

I found references to hart on page 35 of the RISC-V 2.1 spec. However, I could not find a definition for hart in this document. Does hart refer to a hardware-thread or something more sinister?
daveW
  • 511
  • 1
  • 5
  • 10
45
votes
3 answers

What is the memory usage overhead for a 64-bit application?

From what I have found so far it's clear that programs compiled for a 64-bit architecture use twice as much RAM for pointers as their 32-bit alternatives - https://superuser.com/questions/56540/32-bit-vs-64-bit-systems. Does that mean that code…
Petr
  • 13,747
  • 20
  • 89
  • 144
45
votes
19 answers

Porting 32 bit C++ code to 64 bit - is it worth it? Why?

I am aware of some the obvious gains of the x64 architecture (higher addressable RAM addresses, etc)... but: What if my program has no real need to run in native 64 bit mode. Should I port it anyway? Are there any foreseeable deadlines for ending…
NTDLS
  • 4,757
  • 4
  • 44
  • 70
44
votes
2 answers

Why is __int128_t faster than long long on x86-64 GCC?

This is my test code: #include #include #include using namespace std; using ll = long long; int main() { __int128_t a, b; ll x, y; a = rand() + 10000000; b = rand() % 50000; auto t0 =…
xxhxx
  • 871
  • 5
  • 11
44
votes
1 answer

Are Golang binaries portable?

Suppose I'm a primarily Linux user, but I'm developing an application in Go that I want to be cross platform. I've searched around, but I can't seem to find information to absolve the following: If I go install a binary on my amd64 Ubuntu system,…
cat
  • 3,888
  • 5
  • 32
  • 61
43
votes
9 answers

API call to get processor architecture

As part of my app I'm using the NDK and was wondering if it's worth bundling x86 and mips binaries alongside the standard ARM binaries. I figured the best way would be to track what my users actually have, is there an API call to grab the processor…
Ljdawson
  • 12,091
  • 11
  • 45
  • 60
42
votes
13 answers

Undefined symbols for architecture x86_64 on Xcode 6.1

All of a sudden Xcode threw me this error at compilation time: Undefined symbols for architecture x86_64: "_OBJC_CLASS_$_Format", referenced from: objc-class-ref in WOExerciseListViewController.o ld: symbol(s) not found for architecture…
batistomorrow
  • 674
  • 1
  • 6
  • 13