Questions tagged [micro-optimization]

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization is the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc).

Micro-optimization (and optimization in general) tends to be interesting to programmers because they enjoy finding clever solutions to problems. However, micro-optimization carries the connotation of a disproportionate amount of effort being expended to extract relatively small improvements.

That's not to say that micro-optimization is bad practice in all circumstances. Sometimes a small improvement in a part of a code base that gets used frequently (such as the innermost part of a loop) can yield big overall gains in system performance, and building code for highly constrained systems such as microcontrollers will often require cleverness to eke out the most performance from such a small system.

However, it can be tempting to indulge in the practice where it's not necessary, resulting in a lot of time being spent that could have been used more productively, and in code that is difficult to follow as "clever" solutions to problems are often more difficult to understand than simple solutions, and therefore a micro-optimization can have a negative impact on the maintainability of a piece of code.

Programmers are advised to avoid micro-optimization, unless they can make a solid justification for the problems outlined above being worth the performance gains. Should profiling of the code in question identify a hot-spot that is causing a performance bottleneck, then this can be sufficient justification for a micro-optimization.

900 questions
0
votes
1 answer

Cleanest way to check input char is between 0~9 in Assembly

The problem is to convert string to int in RISC-V if any char that is not 0~9 exist, return -1 immediately but I wonder if there's any way to check it by using minimum instruction my way is to put 48 and 57 (which correspond to 0~9 in ASCII) in temp…
王韋翰
  • 25
  • 1
  • 8
0
votes
0 answers

Most insanely efficient way to find index of the minimum of four numbers

#include #include #include using namespace std; class MyTimer { private: std::chrono::time_point starter; std::chrono::time_point ender; public: void…
Duke Le
  • 332
  • 3
  • 14
0
votes
0 answers

Are there any performance benefits in declaring a new variable that contains child objects in JavaScript?

Does declaring a new variable containing child objects execute faster than using the full path? Ie, does this: var a =…
0
votes
1 answer

Are these the smallest possible x86 macros for these stack operations?

I'm making a stack based language as a fun personal project. So, I have some signed/unsigned 32-bit values on the stack and my goal is to write some assembly macros that operate on this stack. Ideally these will be small since they'll be used a lot.…
Hrothgar
  • 127
  • 5
0
votes
0 answers

Are PINSRB and PEXTRB faster or slower than MOV?

I want to store a byte integer array in either a memory location or in a xmm register. To access each byte in that array from memory, I would use: lea rdi,[memory_array] mov al,[rdi] mov [rdi],al To access that each byte in that array from a xmm…
RTC222
  • 2,025
  • 1
  • 20
  • 53
0
votes
1 answer

Optimizing sum of linear sequence = n * (n+1)/2 - anything faster than lea / imul / shrd?

this is my assembly code used to calculate the sum rax = 1 + 2 + 3 + .... + rdi using the gauss method rax = (rdi + 1 ) * rdi / 2. Does any of you have an idea or know an intel instruction to further reduce the number of cycles needed ? remark: the…
0
votes
0 answers

Using cmov to implement C if / else if / else branchlessly, without any jmp

The task for me is to use cmov in assembly to implement the C code snippets, how can I jump to stop after every operation without using any jmp command? I am confused C code: if (x > y){ z = x - y; }else if (y > x){ z = y - x; }else{ z =…
Hant
  • 1
  • 2
0
votes
1 answer

Adding integers from 2 arrays using Vector takes longer time than traditional for loop

I am trying to use Vector to add integer values from 2 arrays faster than a traditional for loop. My Vector count is: 4 which should mean that the addArrays_Vector function should run about 4 times faster than: addArrays_Normally var vectSize =…
Andreas
  • 1,121
  • 4
  • 17
  • 34
0
votes
0 answers

How to use fast division operation without support of division instruction, for a constant divisor?

How do we divide a/b without using division operation? As my hardware IP does not support division instruction, we use library function to do the division and hence takes a lot of cycles. If I want this division in a fast data path, is there a way…
0
votes
1 answer

Fastest way to initialize a __m128i constant with intrinsics?

Currently, I've got a __m128i variable, let's call it X. I want to xor it with a constant 128bit value and save the value back to to X. So, essentially X ^= C for some constant C. Currently, I'm doing something along the lines of: X =…
user2059300
  • 361
  • 1
  • 5
  • 17
0
votes
1 answer

latency for 'pcmpeqb' - memory vs xmm register

i have these 2 options: option 1: loop: ... movdqu xmm0, [rax] pcmpeqb xmm0, [.zero_table] ... ... align 16 .zero_table: DQ 0, 0 option 2: pxor xmm1, xmm1 loop: ... movdqu xmm0, [rax] pcmpeqb xmm0, xmm1 ... …
ELHASKSERVERS
  • 195
  • 1
  • 10
0
votes
1 answer

Dictionary and factorial of large numbers

For n queries I am given a number x and I have to print its factorial under modulo 1000000007. def fact_eff(n, d): if n in d: return d[n] else: ans=n*fact_eff(n-1,d) d[n]=ans return…
0
votes
1 answer

Assembly push or reserve stack for 2 or more registers

I want to use 'rbx' and 'rcx' registers in my function and before using them, I want to save them. Since it's 2 registers, I want to know which way is better? push them one-by-one or reserve stack (16-byte) and copy each value into the stack and…
ELHASKSERVERS
  • 195
  • 1
  • 10
0
votes
1 answer

Assembly Jump with Multiple plus or do plus before jump (performance)

in Assembly, if i have a JUMP table with the address of over 2000 labels: .TABLE: DD .case0 DD .case1 DD .case2 DD .case3 DD .case4 ... ... ... DD .case2000 which way is better for addressing to…
ELHASKSERVERS
  • 195
  • 1
  • 10
0
votes
1 answer

Performance of assembly function with multiple RET

Does function such as this have negative effect on performance? fn: cmp rdi, 0 je lbl0 ... ret lbl0: ... ret call fn And this one? fn0: ... ; no ret, fall through fn1: ... ret
DSblizzard
  • 4,007
  • 7
  • 48
  • 76