I am very new to python and would like to try and fit a histogram to sums of an 'exponential like' and a 'normal like' distribution. In principle, I want to try this on known distributions, all built in with scipi.stats. The process is to be iterated over each pair of a function from pdfs_start
and a function from pdfs_bulk
.
Since the pdfs in scipi stat do not appear to possess an attribute with a list of parameters (besides *args), I checked the documentations to build functions which do as
import inspect
from scipy.stats import norm, lognorm, expon, gamma, beta, pareto
from scipy.stats import kstest, chisquare
from scipy.optimize import curve_fit
def expon_func(x, exp1, exp2):
return expon.pdf(x, exp1, exp2)
expon_func.name = expon.name
def beta_func(x, beta1, beta2, beta3, beta4):
return beta.pdf(x, beta1, beta2, beta3, beta4)
beta_func.name = beta.name
def pareto_func(x, pareto1, pareto2, pareto3):
return pareto.pdf(x, beta1, beta2, beta3)
pareto_func.name = pareto.name
def norm_func(x, norm1, norm2):
return norm.pdf(x, norm1, norm2)
norm_func.name = norm.name
def lognorm_func(x, lognorm1, lognorm2, lognorm3):
return lognorm.pdf(x, lognorm1, lognorm2, lognorm3)
lognorm_func.name = lognorm.name
def gamma_func(x, gamma1, gamma2, gamma3):
return gamma.pdf(x, gamma1, gamma2, gamma3)
gamma_func.name = gamma.name
pdfs_start = [expon_func, beta_func, pareto_func]
pdfs_bulk = [norm_func, lognorm_func, gamma_func, beta_func]
I have tried the following function to perform deliver a new one which returns the sum of its two argument functions:
def create_func(f1, f2):
f1_params = list(inspect.signature(f1).parameters.keys())[1:]
f2_params = list(inspect.signature(f2).parameters.keys())[1:]
def sum_func(x, rel2,*params):
return rel2 * f1(x, *params[:len(f1_params)]) + f2(x, *params[len(f1_params):])
return {'Function': sum_func, 'Name': f"{f1.name} + {f2.name}", \
'p0': [1 for i in range(len(f1_params)+len(f2_params)+1)] }
Notice that the only argument which is shared by every function of the two lists is the independent variable x
. Besides this one, every function has a varying number of parameters, which I don't want to specify by hand for each combination.
The 'p0' entry in the dictionary will be given to curve_fit as an initial guess, since otherwise it will be unable to determine the number of fit parameters.
An example of a call of curve_fit is, then (in the actual case, it would be in a for-loop, but this is enough here)
popt,_ = curve_fit(create_func(pdfs_start[2],pdfs_bulk[3])['Function'], bins[:-1] + (bins[1] - bins[0])/2, hist, p0 = create_func(pdfs_start[2],pdfs_bulk[3])['p0'])
This actually runs, but the resulting curve is exaggeratedly bad (I'm sorry that I won't be able to provide the histogram). I suppose this is a numerical issue and curve_fit simply isn't being able to converge with so many parameters and with the given initial guess, but am not sure.
Can what I intend be implemented? Besides that, do you have any tips to improve my code? I appreciate any insigths!