Questions tagged [micro-architecture]
107 questions
2
votes
0 answers
Do cache block replacement policies in x86 prefer clean blocks over dirty blocks?
Does cache block replacement policies in x86 systems prefer clean blocks over dirty blocks? I am interested to know whether cache-line flush instructions such as clwb influence the cache replacement policy since clwb retains the block in clean state…

Arun Kp
- 392
- 2
- 13
2
votes
0 answers
The execution process of the instruction and the realization in gem5?
I am learning the running process of the program on gem5. And read some books. But I am still confused about the parts in the program execution. Is my understanding below correct?
First, the computer instruction is placed in the ICache, and…

Gerrie
- 736
- 3
- 18
2
votes
0 answers
Why did Intel remove the 16-byte branch target alignment Coding Rule from the Optimization Reference Manual?
Previous versions of the Intel® 64 and IA-32 Architectures Optimization Reference Manual have contained this Coding Rule:
Assembly/Compiler Coding Rule 12. (M impact, H generality)
All branch targets should be 16-byte aligned.
The May 2020 version…

Olsonist
- 2,051
- 1
- 20
- 35
2
votes
1 answer
L1 caches usually have split design, but L2, L3 caches have unified design, why?
I was reading the pros and cons of split design vs unified design of caches in this thread.
Based on my understanding the primary advantage of the split design is: The split design enables us to place the instruction cache close to the instruction…

jhagk
- 111
- 1
- 9
2
votes
1 answer
What is an assisted/assisting load?
The RIDL exploit requires that the attacker trigger a page fault to be able to read stale data from the Line Fill Buffer. But according to About the RIDL vulnerabilities and the "replaying" of loads, an assisted load can also be used.
That question…

Daniel Näslund
- 2,300
- 3
- 22
- 27
2
votes
1 answer
Pipeline Processor Design to handle both branch outcomes
So I have recently been studying about Pipeline processor architecture, mainly in the context of Y86-64. There, I have just read about Branch Prediction and how in case of a mispredicted branch, the Fetch, Decode and Execute Pipeline registers have…

dkapur17
- 476
- 1
- 3
- 11
2
votes
1 answer
Atomic operations in POWER other than LL/SC?
Do any of the POWER ISAs include atomic operations other than LL/SC, e.g., atomic addition, exchange, and so on?

BeeOnRope
- 60,350
- 16
- 207
- 386
2
votes
1 answer
How is it possible that the AVR microarchitecture can fetch 2 operands from the GP-Register to the ALU in only 1 clock cycle?
According to the Datasheets of AVR Microcontroller, as well as the Datasheet of the Instruction Set from the AVR architecture, certain instructions, for example ADD, can fetch 2 operands stored in the GP-Registers during only 1 Clock transition to…

Fabi
- 23
- 2
2
votes
1 answer
Why 16 stepping by 4K in main memory causing no L1d cache miss
I'm on an IvyBridge and want to test the L1d cache organization. My understanding is as follows:
On IvyBridge, L1d cache has 32K capacity, 64B cache line, 8 way set associative. Therefore it has 32K/(64*8) = 64 sets, given a main memory addr, the…

user10865622
- 455
- 3
- 11
2
votes
1 answer
Why jnz counts no cycle?
I found in online resource that IvyBridge has 3 ALU. So I write a small program to test:
global _start
_start:
mov rcx, 10000000
.for_loop: ; do {
inc rax
inc rbx
dec rcx
jnz .for_loop ; } while (--rcx)
…

user10865622
- 455
- 3
- 11
2
votes
1 answer
How does Out of Order execution work with conditional instructions, Ex: CMOVcc in Intel or ADDNE (Add not equal) in ARM
I know they can only correctly execute after instructions before them in Re-Order Buffer are committed. My doubt is, do modern processors hold them till they are last in ROB or do any prediction counters/structures are used even for predicting the…

Tiwari
- 1,014
- 2
- 12
- 22
2
votes
1 answer
Zilog z80 I, R registers purpose
There are I and R registers in the Control section of the Z80 cpu, what is their purpose and usage?

Mikhail Veselov
- 21
- 3
2
votes
1 answer
What are the control instructions and move instructions latency for Intel's newer architectures?
I am looking at the Intel Architectures Optimization Reference Manual 2017 (Page 759). I am looking for Haswell and Skylake architectures. MOV, PUSH, JMP, CALL instructions are intentionally omitted in that table. No latency information is given.…

soham
- 1,508
- 6
- 30
- 47
2
votes
1 answer
How does cacheline to register data transfer work?
Suppose I have an int array of 10 elements. With a 64 byte cacheline, it can hold 16 array elements from arr[0] to arr[15].
I would like to know what happens when you fetch, for example, arr[5] from the L1 cache into a register. How does this…

Remus
- 33
- 3
2
votes
2 answers
Status of program counter during hlt
In the Intel 8085 microprocessor, precisely at what point (t state) does the program counter get updated? Is it just after t1 (i.e., just when the current address in the PC is placed on the address bus) or at t3, when the instruction fetch is being…

Yashwanth Reddy
- 21
- 1