Fast inverse square root in python on float32

Question

I have done some checks for the fast inverse square root method in python (jupyterlab using python version 3.8.8) and for some reason then I've come to the conclusion that I must either be doing something wrong or there is something about float32 which I do not understand. The problem is that according to the following article: matrix67.com/data/InvSqrt.pdf

Then the magic number 0x5f375a86 is concluded to perform slightly better than 0x5f3759df But with the code below where I check which one is superior (because I suspected that python might do things differently than c/c++) then I come to the conclusion that 0x5f3759df is superior from the range 1 to 300000. 0x5f3759df vs 0x5f375a86

Can someone explain to me what is wrong? Have I overlooked something? I also tried it with some of the values listed on https://en.wikipedia.org/wiki/Fast_inverse_square_root
both for x64 and x32 and it always leads down to the same conclusion that 0x5f3759df is the most accurate which at this point is mental. I have listed the used code below where the "np_invsqrt" are slightly modified versions of this guys code: https://github.com/ajcr/ajcr.github.io/blob/master/_posts/2016-04-01-fast-inverse-square-root-python.md

def np_invsqrt1(x):
    neg_half = -0.5
    half3 = 1.5
    y = np.float32(x)    
    i = y.view(np.int32)
    i = np.int32(0x5f3759df) + -1*np.int32(i >> 1) #quake 32
    y = i.view(np.float32)   
    y = y * (half3 + (neg_half*x * y * y))
    return float(y)

def np_invsqrt2(x):
    neg_half = -0.5
    half3 = 1.5
    y = np.float32(x)    
    i = y.view(np.int32)
    i = np.int32(0x5F375A86) + -1*np.int32(i >> 1) #lomont 32
    y = i.view(np.float32)   
    y = y * (half3 + (neg_half*x * y * y))
    return float(y)


s = np.arange(1, 300000,0.1)
c1 = 0
c2 = 0
for i in range (0,len(s)):
    sq1 = np_invsqrt1(s[i])
    sq2 = np_invsqrt2(s[i])
    sqt1 = (1/np.sqrt(s[i])-sq1)
    sqt2 = (1/np.sqrt(s[i])-sq2)
    #print(sqt1)
    #print(sqt2)
    if min(sqt1,sqt2) == sqt1:
        #print(i+1,1)
        c1 = c1 + 1
    else:
        #print(i+1,2)
        c2 = c2 + 1
print(c1,c2)
if max(c1,c2) == c1:
    print("sq1 is best") 
else:
    print("sq2 is best")

The article you're reading doesn't say that 0x5f375a86 produces a better approximation for every possible input. It says 0x5f375a86 minimizes the maximum relative error - its *worst-case* behavior is better than other constants' worst-case behavior. — user2357112, Mar 15 '22 at 22:22
Ah ok so even though the maximum relative error of one magic number is less than another magic number then it doesn't necessarily mean that it is a better fit? Thank you that is a satisfying answer. — Henrik, Mar 15 '22 at 22:45
This question involves numpy. numpy is not part of python, it is a third-party package. python does not have a float32 type, numpy does. — President James K. Polk, Mar 16 '22 at 00:55

Fast inverse square root in python on float32

0 Answers0