I am using different python to fit density functions on a dataset. This data set is made of positive time values starting from 1 second.
I tested different density functions from scipy.statistics
and the powerlaw
library, as well as my own functions using scipy.optimize
's function curve_fit()
.
So far, I obtained the best results when fitting the following "modified" power law function :
def funct(x, alpha, x0):
return((x+x0)**(-alpha))
My code is as follow :
bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]
param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
x_data,
y_data,
bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])
C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]
# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)
fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None', marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None', marker='.')
The fitting returns a value of 8.48 for x0, and of 1.40 for alpha. In the loglog plot, the data and fit plot look like this :
- My first question is technical. Why do I get the following warning and error in
opt.curve_fit
when changing (x+x0) to (x-x0) in thefunct
function ? Since my bounds for x0 are (-inf, +inf), I was expecting the fitting to return -8.48.
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point.
- My other questions are theoritical. Is (x+x0)^(-alpha) a standard distribution? What does the x0 value represents, how to interpret this 8.48s value physically? From what I understand this means that my distribution corresponds to a shifted power law distribution? Can I consider that x0 corresponds to the xmin value classically needed when fitting data to power laws ?
- Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution. However, I am wondering what is the standard way to characterise the full data with a distribution that would be a power law after xmin and something else before xmin.
This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!