Questions tagged [micro-optimization]

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization (and optimization in general) tends to be interesting to programmers because they enjoy finding clever solutions to problems. However, micro-optimization carries the connotation of a disproportionate amount of effort being expended to extract relatively small improvements.

That's not to say that micro-optimization is bad practice in all circumstances. Sometimes a small improvement in a part of a code base that gets used frequently (such as the innermost part of a loop) can yield big overall gains in system performance, and building code for highly constrained systems such as microcontrollers will often require cleverness to eke out the most performance from such a small system.

However, it can be tempting to indulge in the practice where it's not necessary, resulting in a lot of time being spent that could have been used more productively, and in code that is difficult to follow as "clever" solutions to problems are often more difficult to understand than simple solutions, and therefore a micro-optimization can have a negative impact on the maintainability of a piece of code.

Programmers are advised to avoid micro-optimization, unless they can make a solid justification for the problems outlined above being worth the performance gains. Should profiling of the code in question identify a hot-spot that is causing a performance bottleneck, then this can be sufficient justification for a micro-optimization.

900 questions
63
votes
6 answers

Does using xor reg, reg give advantage over mov reg, 0?

There're two well-known ways to set an integer register to zero value on x86. Either mov reg, 0 or xor reg, reg There's an opinion that the second variant is better since the value 0 is not stored in the code and that saves several bytes of…
sharptooth
  • 167,383
  • 100
  • 513
  • 979
62
votes
4 answers

How to force GCC to assume that a floating-point expression is non-negative?

There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1]) (NB: I am aware of std::hypot, this is not…
lisyarus
  • 15,025
  • 3
  • 43
  • 68
61
votes
4 answers

"enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp"

What is the difference between the enter and push ebp mov ebp, esp sub esp, imm instructions? Is there a performance difference? If so, which is faster and why do compilers always use the latter? Similarly with the leave and mov esp, ebp pop …
小太郎
  • 5,510
  • 6
  • 37
  • 48
61
votes
9 answers

Divide by 10 using bit shifts?

Is it possible to divide an unsigned integer by 10 by using pure bit shifts, addition, subtraction and maybe multiply? Using a processor with very limited resources and slow divide.
Thomas O
  • 6,026
  • 12
  • 42
  • 60
60
votes
4 answers

Weird use of `?:` in `typeid` code

In one of the projects I'm working on, I'm seeing this code struct Base { virtual ~Base() { } }; struct ClassX { bool isHoldingDerivedObj() const { return typeid(1 ? *m_basePtr : *m_basePtr) == typeid(Derived); } Base *m_basePtr; }; I…
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
57
votes
1 answer

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

I'm a newbie at instruction optimization. I did a simple analysis on a simple function dotp which is used to get the dot product of two float arrays. The C code is as follows: float dotp( const float x[], const float y[],…
Forward
  • 855
  • 7
  • 12
56
votes
4 answers

Why do none of the major compilers optimize this conditional store that checks if the value is already set?

I stumbled across this Reddit post which is a joke on the following code snippet, void f(int& x) { if (x != 1) { x = 1; } } void g(int& x) { x = 1; } saying that the two functions are not equivalent to 'the compiler'. I was…
chrysante
  • 2,328
  • 4
  • 24
55
votes
3 answers

What does `rep ret` mean?

I was testing some code on Visual Studio 2008 and noticed security_cookie. I can understand the point of it, but I don't understand what the purpose of this instruction is. rep ret /* REP to avoid AMD branch prediction penalty */ Of course I…
Devolus
  • 21,661
  • 13
  • 66
  • 113
54
votes
2 answers

INC instruction vs ADD 1: Does it matter?

From Ira Baxter answer on, Why do the INC and DEC instructions not affect the Carry Flag (CF)? Mostly, I stay away from INC and DEC now, because they do partial condition code updates, and this can cause funny stalls in the pipeline, and ADD/SUB…
Gilgamesz
  • 4,727
  • 3
  • 28
  • 63
53
votes
16 answers

' ... != null' or 'null != ....' best performance?

I wrote two methods to check there performance public class Test1 { private String value; public void notNull(){ if( value != null) { //do something } } public void nullNot(){ if( null != value) { //do something } } } and checked…
asela38
  • 4,546
  • 6
  • 25
  • 31
50
votes
3 answers

Do java finals help the compiler create more efficient bytecode?

Possible Duplicate: Does use of final keyword in Java improve the performance? The final modifier has different consequences in java depending on what you apply it to. What I'm wondering is if additionally it might help the compiler create more…
Miquel
  • 15,405
  • 8
  • 54
  • 87
50
votes
2 answers

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

This loop runs at one iteration per 3 cycles on Intel Conroe/Merom, bottlenecked on imul throughput as expected. But on Haswell/Skylake, it runs at one iteration per 11 cycles, apparently because setnz al has a dependency on the last imul. ;…
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
47
votes
2 answers

Avoiding the overhead of C# virtual calls

I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance. In order to keep the…
Haus
  • 1,492
  • 7
  • 23
47
votes
2 answers

Can x86's MOV really be "free"? Why can't I reproduce this at all?

I keep seeing people claim that the MOV instruction can be free in x86, because of register renaming. For the life of me, I can't verify this in a single test case. Every test case I try debunks it. For example, here's the code I'm compiling with…
user541686
  • 205,094
  • 128
  • 528
  • 886
46
votes
1 answer

Why does Intel's compiler prefer NEG+ADD over SUB?

In examining the output of various compilers for a variety of code snippets, I've noticed that Intel's C compiler (ICC) has a strong tendency to prefer emitting a pair of NEG+ADD instructions where other compilers would use a single SUB…
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
1
2
3
59 60