Questions tagged [micro-optimization]

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization (and optimization in general) tends to be interesting to programmers because they enjoy finding clever solutions to problems. However, micro-optimization carries the connotation of a disproportionate amount of effort being expended to extract relatively small improvements.

That's not to say that micro-optimization is bad practice in all circumstances. Sometimes a small improvement in a part of a code base that gets used frequently (such as the innermost part of a loop) can yield big overall gains in system performance, and building code for highly constrained systems such as microcontrollers will often require cleverness to eke out the most performance from such a small system.

However, it can be tempting to indulge in the practice where it's not necessary, resulting in a lot of time being spent that could have been used more productively, and in code that is difficult to follow as "clever" solutions to problems are often more difficult to understand than simple solutions, and therefore a micro-optimization can have a negative impact on the maintainability of a piece of code.

Programmers are advised to avoid micro-optimization, unless they can make a solid justification for the problems outlined above being worth the performance gains. Should profiling of the code in question identify a hot-spot that is causing a performance bottleneck, then this can be sufficient justification for a micro-optimization.

900 questions
1
vote
1 answer

Persuading the compiler to set registers outside a loop

Firstly, I will prefix this by saying I don't think it is necessary to understand the functioning of the code below to make a sensible attempt to solve my problem. This is primarily an optimisation problem. The code is to understand what is being…
1
vote
1 answer

How to hide SHLD delay?

I have a simple bit reader which uses the SHLD instruction (__shiftleft128) to read a bit stream. This works great. However, I have been doing some profiling and I notice that whatever instruction comes after the SHLD instruction takes a lot of…
ronag
  • 49,529
  • 25
  • 126
  • 221
1
vote
3 answers

Optimize CSS: Narrow Definition (#mytable tbody span.myclass) better?

I wondered whether or not a 'narrow' definition such as #mytable tbody span.myclass { color: #ffffff; } is better/faster to parse than just .myclass { color: #ffffff; } I read somewhere that narrow definitions supposedly actually…
Alex
  • 75,813
  • 86
  • 255
  • 348
1
vote
5 answers

Most efficient method to create all the letters in the alphabet into a string

Possible Duplicate: Generating an array of letters in the alphabet in C# (Theoretical question only, was just pondering it as a writing a filtering system (not using an alphabet, but got me thinking)). So lets say I want to create a filter list…
John Mitchell
  • 9,653
  • 9
  • 57
  • 91
1
vote
4 answers

How to optimize this python script further?

I've created this script to compute the string similarity in python. Is there any way I can make it run any faster? tries = input() while tries > 0: mainstr = raw_input() tot = 0 ml = len(mainstr) for i in xrange(ml): j = 0 …
2hamed
  • 8,719
  • 13
  • 69
  • 112
1
vote
1 answer

Does reading an int array from shared memory preclude bank conflicts?

I am designing a CUDA kernel that will be launched with 16 threads per thread block. I have an array of N ints in shared memory (i.e. per thread block) that I wish to process. If the access pattern of the threads is consecutive into the array then…
twerdster
  • 4,977
  • 3
  • 40
  • 70
0
votes
4 answers

C++: the fastest way to access specific octet of int

Assuming we have 32bit integer, 8bit char, gcc compiler and Intel architecture: What would be the fastest way (with no assembler usage) to extract, say, third octet of integer variable? To store it to a char of some specific place of char[] for…
lithuak
  • 6,028
  • 9
  • 42
  • 54
0
votes
4 answers

Is it better for performance to use variables vs calling things over and over in PHP?

In PHP is it considered good practise to store things like cookie or session information inside variables instead of callling them over and over. For example: $happiness_level = $_SESSION['happiness_level']; echo $happiness_level.' something'; echo…
TKpop
  • 309
  • 1
  • 4
  • 8
0
votes
1 answer

Efficient PHP date comparison

I'm trying to write a function that will return the number of days between two dates as quickly as possible. This function gets called thousands ofa million times in my code and optimizing it to the max would be really helpful. The dates are strings…
Alex Grin
  • 8,121
  • 6
  • 33
  • 57
0
votes
1 answer

Speed up strlen using SWAR in x86-64 assembly

The asm function strlen receives the link to a string as a char - Array. To to so, the function may use SWAR on general purpose register, but without using xmm register or SSE instructions. The function checks with the bit manipulation: (v -…
HeapUnderStop
  • 378
  • 1
  • 9
0
votes
1 answer

Detect null byte in a (16 bit) word in Assembly

I have to detect a null byte in a word, So I have to check if 8 of the 16 bits are zero, so basically either the front 8 bits or the back. The problem is I can't use a lot of cycles. So I need a bit mask, that checks the front and the back in just…
0
votes
1 answer

Compare signed integers and return either 0 or -1 in Thumb2 assembly

In thumb2 assembly, when r0 and r1 have signed integers, I like to have r1=-1 (i.e. 0xffffffff) if r0 < r1, otherwise r1=0. I can simply code: 4288 cmp r0, r1 bfb4 ite lt f04f 31ff movlt.w r1, #-1 2100 …
0
votes
1 answer

AND + CMP or SHR + CMP?

I’m wondering what’ll result in overall “better” code (if speed’s equal then compactness): AND-and-CMP… #define is_foo(someuint) ((someuint & (unsigned int)~0x7FU) == 0x001B0080U) … or SHR-and-CMP: #define is_foo(someuint) ((someuint >> 7) ==…
mirabilos
  • 5,123
  • 2
  • 46
  • 72
0
votes
1 answer

How to strip debug symbols for real in Xcode?

I have included the following in an Xcode Application project: void very_specific_symbol_that_will_never_appear_by_accident(void) { } I proceeded to disable all 'debug symbol' related settings and enable all 'strip' related settings I could find,…
user16217248
  • 3,119
  • 19
  • 19
  • 37
0
votes
0 answers

Getting the address of a symbol using AT&T x86 assembly

I've been messing around with a toy kernel and I'm confused about accessing a symbol's address. Suppose I'm defining the stack I want to use like this: _stack_bot: .skip 4096 _stack_top: I can't load the stack using: movl _stack_top,…
oda404
  • 1