Highest Voted 'micro-architecture' Questions

4

votes

1 answer

Is mov r64, m64 one cycle or two cycle latency?

I'm on IvyBridge, I wrote the following simple program to measure the latency of mov: section .bss align 64 buf: resb 64 section .text global _start _start: mov rcx, 1000000000 xor rax, rax loop: mov rax, [buf+rax] …

asked Jan 07 '19 at 10:44

user10865622

455
3
11

4

votes

1 answer

When making read request to DRAM, why we need to read tag and data, not data only?

I am going through David Patterson and John Hennessy's computer architecture book. In chapter2, it mentions that we may need to make two separates request to read tag and data in two cycles if we store tags in DRAM. My question is why do we need to…

memory cpu-architecture cpu-cache micro-architecture

asked Dec 31 '18 at 21:02

Shibo Chen

77
6

4

votes

3 answers

How modern X86 processors actually compute multiplications?

I was watching some lecture on algorithms, and the professor used multiplication as an example of how naive algorithms can be improved... It made me realize that multiplication is not that obvious, although when I am coding I just consider it a…

algorithm x86 cpu-architecture alu micro-architecture

asked Oct 14 '14 at 21:03

speeder

6,197
5
34
51

3

votes

0 answers

intel alderlake performance degradation after spin wait

I'm tunning my program for low-latency. I have a tight calculation function calc(); which is using SIMD floating point instructions heavily. I had test the performance of calc(); using perf command. it shows that this calc function is using ~10k…

performance intel micro-architecture

asked Feb 25 '23 at 15:49

VariantF

41
1
5

3

votes

0 answers

handling x86-64 microarchitecture levels in Debian package names

I'm planning to build different versions of intense numerical program for x86-64 architectures. Conveniently, in 2020, 4 levels of x86-64 microarchitecture were defined that can be passed to the compiler via the "-march" flag. Thus, for GCC 11 (and…

debian x86-64 cpu-architecture micro-architecture debian-packaging

asked Feb 27 '22 at 05:28

Justin JRTI

56
4

3

votes

1 answer

how do i get the cpu information for my computer i.e functional units/latency etc

i'm trying to learn assembly and in the book I'm reading I came across functional units and their latencies shown in tables in the textbook. I was wondering what are the functional units of my CPU and what are the latencies? integer addition,…

assembly x86 cpu-architecture micro-architecture

asked Jun 23 '21 at 14:30

Megan Darcy

530
5
15

3

votes

1 answer

Execute operations of the same instruction separately in an OoO processor

Imagine that we have an instruction which has been divided into 3 micro-operations, and we have an out-of-order processor. My question is: these 3 uops must be executed sequentially or can the processor alternate these uops with other uops from…

assembly x86 cpu-architecture instructions micro-architecture

asked Jun 13 '20 at 22:37

isma

143
1
6

3

votes

0 answers

Does the store buffer hold physical or virtual addresses on modern x86?

Modern Intel and AMD chips have large store buffers to buffer stores before commit to the L1 cache. Conceptually, these entries hold the store data and store address. For the address part, do these buffer entries hold virtual or physical addresses,…

x86 intel cpu-architecture amd-processor micro-architecture

asked Apr 13 '20 at 15:16

BeeOnRope

60,350
16
207
386

3

votes

0 answers

How are micro-ops arranged in the Instruction Decode Queue (IDQ)?

Something I've been wondering for a while, but firstly, one assumption to make is that all μops produced by a macro-op could have the same rip as the macro-op (I'm pretty sure that the IQ would have a rip for each IFETCH block and the decoders could…

x86 cpu intel cpu-architecture micro-architecture

asked May 23 '19 at 19:10

Lewis Kelsey

4,129
1
32
42

3

votes

1 answer

Will CPUID serialize speculative data caching?

I found the description of a speculative data caching procedure from multiple instruction entries in Intel Vol.2. For example, the lfence: Processors are free to fetch and cache data speculatively from regions of system memory that use the WB,…

x86 cpu-cache microbenchmark cpuid micro-architecture

asked Jan 15 '19 at 05:01

user10865622

455
3
11

3

votes

1 answer

Why dependency in a loop iteration can't be executed together with the previous one

I use this code to test the impact of dependency in a loop iteration on IvyBridge: global _start _start: mov rcx, 1000000000 .for_loop: inc rax ; uop A inc rax ; uop B dec rcx ; uop C jnz .for_loop …

performance assembly x86 micro-optimization micro-architecture

asked Jan 05 '19 at 00:43

user10865622

455
3
11

3

votes

1 answer

Architecture and microarchitecture

Can someone explain me broadly the difference between a processor’s architecture and its microarchitecture as well as the relation between them? One should be related to its functioning parts but the other I do not see

system cpu cpu-architecture micro-architecture

asked Feb 04 '16 at 12:22

Philippe

700
1
7
17

3

votes

0 answers

Large run-to-run variance shown by a copy-loop implemented with MOVDQU

I am seeking an explanation for results that I am seeing in a loop that moves 64bytes per-iteration, from some source memory location to some destination memory location, using the x86 movdqu instruction (movdqu instruction supports moving of 16byte…

x86 x86-64 memcpy memory-bandwidth micro-architecture

asked Jun 03 '15 at 22:21

Karthik M

78
7

2

votes

0 answers

Why does FADDP D-form have higher throughput than FADDP Q-form on the Cortex-A72

I've been operating on a rough rule of thumb that Q-form ASIMD instructions are as good or better than D-form if you've got enough data to operate on. I was therefore surprised to see when reading §3.15 of the Cortex-A72 Software Optimization Guide…

cpu-architecture simd arm64 neon micro-architecture

asked Mar 29 '21 at 13:47

Steve Cox

1,947
13
13

2

votes

1 answer

How does Load Store Queue work in the presence of MSHR?

I understand the basic working of load-store queue, which is when loads compute their address, they check the store queue for any prior stores to the same address and if there is one then they gets the data from the most recent store else from…

queue cpu-architecture cpu-cache micro-architecture

asked Jan 22 '21 at 19:27

Nebula

31
1

Questions tagged [micro-architecture]