I want to normalize a function (for example, chi2.pdf of scipy) over a range of A to B. For example, chi2.pdf is normalized for its range of 0 to infinity and it's integral over that area is 1. In order to do that, I can calculate integral of function over A to B, and divide the function by that integral. I can implement this with the following code:
import numpy as np
from scipy.stats import chi2
from scipy.integrate import quad
A = 2
B = 4
df = 3
z = quad(chi2.pdf,A,B,args=(df,A)[0]
Quad passes arguments df as degree of freedom, and A as loc - I want my chi square function shifted by A for various reasons. Now that I have z
I can define a new function :
def normalized_chi_2(x,df,A,z):
y = chi2.pdf(x,df,A)/z
return(y)
A quick check with integration again:
integral_chi2 = quad(normalized_chi_2,A,B,args=(df,A,z)[0]
print(integral_chi2)
>0.9999999999999999
Shows that I achieved my purpose. But having two functions, and calculating Z in the main is relatively unwieldy, so I figured I can define a new function and calculate Z inside that function.
def normalized_chi_1(x,df,A):
z = quad(chi2.pdf,A,B,args=(df,A))[0]
y = chi2.pdf(x,df,A) / z
return(y)
Now when I do a quick integration again:
integral_chi1 = quad(normalized_chi_1,A,B,args=(df,A))[0]
print(integral_chi1)
>0.42759329552910064
I don't get 1, and I get value equal to the value of original, unnormalized chi2.pdf (the z above). Another problem is that normalized_chi_1
(that takes df and A, and calculates its own z) is very very slow. For example, method 2, where I calculate z outside the function and pass it into a next function takes ~0.07 seconds, while method 1, where I calculate z inside the function takes ~7.30 seconds. Hundred times slower.