I have done some checks for the fast inverse square root method in python (jupyterlab using python version 3.8.8) and for some reason then I've come to the conclusion that I must either be doing something wrong or there is something about float32 which I do not understand. The problem is that according to the following article: matrix67.com/data/InvSqrt.pdf
Then the magic number 0x5f375a86 is concluded to perform slightly better than 0x5f3759df But with the code below where I check which one is superior (because I suspected that python might do things differently than c/c++) then I come to the conclusion that 0x5f3759df is superior from the range 1 to 300000. 0x5f3759df vs 0x5f375a86
Can someone explain to me what is wrong? Have I overlooked something?
I also tried it with some of the values listed on https://en.wikipedia.org/wiki/Fast_inverse_square_root
both for x64 and x32 and it always leads down to the same conclusion that 0x5f3759df is the most accurate which at this point is mental.
I have listed the used code below where the "np_invsqrt" are slightly modified versions of this guys code: https://github.com/ajcr/ajcr.github.io/blob/master/_posts/2016-04-01-fast-inverse-square-root-python.md
def np_invsqrt1(x):
neg_half = -0.5
half3 = 1.5
y = np.float32(x)
i = y.view(np.int32)
i = np.int32(0x5f3759df) + -1*np.int32(i >> 1) #quake 32
y = i.view(np.float32)
y = y * (half3 + (neg_half*x * y * y))
return float(y)
def np_invsqrt2(x):
neg_half = -0.5
half3 = 1.5
y = np.float32(x)
i = y.view(np.int32)
i = np.int32(0x5F375A86) + -1*np.int32(i >> 1) #lomont 32
y = i.view(np.float32)
y = y * (half3 + (neg_half*x * y * y))
return float(y)
s = np.arange(1, 300000,0.1)
c1 = 0
c2 = 0
for i in range (0,len(s)):
sq1 = np_invsqrt1(s[i])
sq2 = np_invsqrt2(s[i])
sqt1 = (1/np.sqrt(s[i])-sq1)
sqt2 = (1/np.sqrt(s[i])-sq2)
#print(sqt1)
#print(sqt2)
if min(sqt1,sqt2) == sqt1:
#print(i+1,1)
c1 = c1 + 1
else:
#print(i+1,2)
c2 = c2 + 1
print(c1,c2)
if max(c1,c2) == c1:
print("sq1 is best")
else:
print("sq2 is best")