While you said in the comments that you don't want a lookup table based solution, I still present one here. The reason is simple: this lookup table is 516 bytes. And if I compile your Log2
with -O3
, I get a ~740 byte function for that, so it is in the same ballpark.
I didn't create a solution which perfectly matches yours. The reason is simple: your version is not as precise as possible. I used rint(log(in/65536.0f)/log(2)*65536)
as a reference. Your version produces worst difference of 2, and average difference of 1.0. This proposed version has a worst difference of 1, and average difference of 0.2. So this version is more accurate.
About performance: I've checked two microbenchmarks:
- use a simple LCG random generator for input. My version is 29 times faster
- use numbers 0x10000->0x20000 linearly. My version is 17 times faster
The solution is extremely simple (use initTable()
to initialize the lookup table), it linearly interpolates between table elements:
unsigned short table[0x102];
void initTable() {
for (int i=0; i<0x102; i++) {
int v = rint(log(i*0x100/65536.0f+1)/log(2)*65536);
if (v>0xffff) v = 0xffff;
table[i] = v;
}
}
int log2(int val) {
int idx = (val-0x10000)>>8;
int l0 = table[idx];
int l1 = table[idx+1];
return l0+(((l1-l0)*(val&0xff)+128)>>8);
}
I've just played with the table, and here are further results:
- you can decrease table size to have 0x82 elements (260 bytes), and still have worst error of 1, and average error of 0.32 (you need to put
0.5+
in rint()
in this case)
- you can decrease table size to have 0x42 elements (132 bytes), worst error becomes 2, and average error is 0.53 (you need to put
0.75+
in rint()
in this case)
- decreasing the table size further significantly increases worst error