Questions tagged [micro-optimization]

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization (and optimization in general) tends to be interesting to programmers because they enjoy finding clever solutions to problems. However, micro-optimization carries the connotation of a disproportionate amount of effort being expended to extract relatively small improvements.

That's not to say that micro-optimization is bad practice in all circumstances. Sometimes a small improvement in a part of a code base that gets used frequently (such as the innermost part of a loop) can yield big overall gains in system performance, and building code for highly constrained systems such as microcontrollers will often require cleverness to eke out the most performance from such a small system.

However, it can be tempting to indulge in the practice where it's not necessary, resulting in a lot of time being spent that could have been used more productively, and in code that is difficult to follow as "clever" solutions to problems are often more difficult to understand than simple solutions, and therefore a micro-optimization can have a negative impact on the maintainability of a piece of code.

Programmers are advised to avoid micro-optimization, unless they can make a solid justification for the problems outlined above being worth the performance gains. Should profiling of the code in question identify a hot-spot that is causing a performance bottleneck, then this can be sufficient justification for a micro-optimization.

900 questions

votes

6 answers

Does using xor reg, reg give advantage over mov reg, 0?

There're two well-known ways to set an integer register to zero value on x86. Either mov reg, 0 or xor reg, reg There's an opinion that the second variant is better since the value 0 is not stored in the code and that saves several bytes of…

assembly x86 micro-optimization

asked Jul 16 '09 at 06:06

sharptooth

167,383
100
513
979

votes

4 answers

How to force GCC to assume that a floating-point expression is non-negative?

There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1]) (NB: I am aware of std::hypot, this is not…

c++ gcc assembly floating-point micro-optimization

asked Aug 27 '19 at 11:35

lisyarus

15,025
3
43
68

votes

4 answers

"enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp"

What is the difference between the enter and push ebp mov ebp, esp sub esp, imm instructions? Is there a performance difference? If so, which is faster and why do compilers always use the latter? Similarly with the leave and mov esp, ebp pop …

assembly x86 stack micro-optimization stack-frame

asked May 11 '11 at 05:59

小太郎

5,510
6
37
48

votes

9 answers

Divide by 10 using bit shifts?

Is it possible to divide an unsigned integer by 10 by using pure bit shifts, addition, subtraction and maybe multiply? Using a processor with very limited resources and slow divide.

math bit micro-optimization low-level integer-division

asked Apr 05 '11 at 21:04

Thomas O

6,026
12
42
60

votes

4 answers

Weird use of `?:` in `typeid` code

In one of the projects I'm working on, I'm seeing this code struct Base { virtual ~Base() { } }; struct ClassX { bool isHoldingDerivedObj() const { return typeid(1 ? *m_basePtr : *m_basePtr) == typeid(Derived); } Base *m_basePtr; }; I…

c++ conditional-operator micro-optimization typeid

asked Jul 22 '11 at 20:43

Johannes Schaub - litb

496,577
130
894
1,212

votes

1 answer

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

I'm a newbie at instruction optimization. I did a simple analysis on a simple function dotp which is used to get the dot product of two float arrays. The C code is as follows: float dotp( const float x[], const float y[],…

c assembly x86 sse micro-optimization

asked Jul 15 '17 at 01:14

Forward

votes

4 answers

Why do none of the major compilers optimize this conditional store that checks if the value is already set?

I stumbled across this Reddit post which is a joke on the following code snippet, void f(int& x) { if (x != 1) { x = 1; } } void g(int& x) { x = 1; } saying that the two functions are not equivalent to 'the compiler'. I was…

c++ compiler-optimization micro-optimization

asked May 16 '23 at 09:40

chrysante

2,328
4
24

votes

3 answers

What does `rep ret` mean?

I was testing some code on Visual Studio 2008 and noticed security_cookie. I can understand the point of it, but I don't understand what the purpose of this instruction is. rep ret /* REP to avoid AMD branch prediction penalty */ Of course I…

assembly x86 micro-optimization branch-prediction

asked Dec 11 '13 at 17:48

Devolus

21,661
13
66
113

votes

2 answers

INC instruction vs ADD 1: Does it matter?

From Ira Baxter answer on, Why do the INC and DEC instructions not affect the Carry Flag (CF)? Mostly, I stay away from INC and DEC now, because they do partial condition code updates, and this can cause funny stalls in the pipeline, and ADD/SUB…

performance assembly x86 increment micro-optimization

asked Apr 08 '16 at 22:06

Gilgamesz

4,727
3
28
63

votes

16 answers

' ... != null' or 'null != ....' best performance?

I wrote two methods to check there performance public class Test1 { private String value; public void notNull(){ if( value != null) { //do something } } public void nullNot(){ if( null != value) { //do something } } } and checked…

java performance micro-optimization

asked Mar 08 '10 at 00:14

asela38

4,546
6
25
31

votes

3 answers

Do java finals help the compiler create more efficient bytecode?

Possible Duplicate: Does use of final keyword in Java improve the performance? The final modifier has different consequences in java depending on what you apply it to. What I'm wondering is if additionally it might help the compiler create more…

java optimization micro-optimization

asked Dec 02 '11 at 09:45

Miquel

15,405
8
54
87

votes

2 answers

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

This loop runs at one iteration per 3 cycles on Intel Conroe/Merom, bottlenecked on imul throughput as expected. But on Haswell/Skylake, it runs at one iteration per 11 cycles, apparently because setnz al has a dependency on the last imul. ;…

assembly x86 intel cpu-architecture micro-optimization

asked Aug 13 '17 at 12:05

Peter Cordes

328,167
45
605
847

votes

2 answers

Avoiding the overhead of C# virtual calls

I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance. In order to keep the…

c# virtual-functions micro-optimization

asked Dec 14 '18 at 19:35

Haus

1,492
7
23

votes

2 answers

Can x86's MOV really be "free"? Why can't I reproduce this at all?

I keep seeing people claim that the MOV instruction can be free in x86, because of register renaming. For the life of me, I can't verify this in a single test case. Every test case I try debunks it. For example, here's the code I'm compiling with…

c assembly x86 cpu-architecture micro-optimization

asked May 24 '17 at 22:16

user541686

205,094
128
528
886

votes

1 answer

Why does Intel's compiler prefer NEG+ADD over SUB?

In examining the output of various compilers for a variety of code snippets, I've noticed that Intel's C compiler (ICC) has a strong tendency to prefer emitting a pair of NEG+ADD instructions where other compilers would use a single SUB…

assembly x86 micro-optimization icc

asked Jun 02 '17 at 13:24

Cody Gray - on strike

239,200
50
490
574

Prev 1

…

59 60 Next