0

I am very new to python and would like to try and fit a histogram to sums of an 'exponential like' and a 'normal like' distribution. In principle, I want to try this on known distributions, all built in with scipi.stats. The process is to be iterated over each pair of a function from pdfs_start and a function from pdfs_bulk.

Since the pdfs in scipi stat do not appear to possess an attribute with a list of parameters (besides *args), I checked the documentations to build functions which do as

import inspect
from scipy.stats import norm, lognorm, expon, gamma, beta, pareto
from scipy.stats import kstest, chisquare
from scipy.optimize import curve_fit

def expon_func(x, exp1, exp2):
    return expon.pdf(x, exp1, exp2)
expon_func.name = expon.name

def beta_func(x, beta1, beta2, beta3, beta4):
    return beta.pdf(x, beta1, beta2, beta3, beta4)
beta_func.name = beta.name

def pareto_func(x, pareto1, pareto2, pareto3):
    return pareto.pdf(x, beta1, beta2, beta3)
pareto_func.name = pareto.name

def norm_func(x, norm1, norm2):
    return norm.pdf(x, norm1, norm2)
norm_func.name = norm.name

def lognorm_func(x, lognorm1, lognorm2, lognorm3):
    return lognorm.pdf(x, lognorm1, lognorm2, lognorm3)
lognorm_func.name = lognorm.name

def gamma_func(x, gamma1, gamma2, gamma3):
    return gamma.pdf(x, gamma1, gamma2, gamma3)
gamma_func.name = gamma.name

pdfs_start = [expon_func, beta_func, pareto_func]
pdfs_bulk = [norm_func, lognorm_func, gamma_func, beta_func]

I have tried the following function to perform deliver a new one which returns the sum of its two argument functions:

def create_func(f1, f2):
    f1_params = list(inspect.signature(f1).parameters.keys())[1:]
    f2_params = list(inspect.signature(f2).parameters.keys())[1:]
    
    def sum_func(x, rel2,*params):
        return  rel2 * f1(x, *params[:len(f1_params)]) + f2(x, *params[len(f1_params):])

    return {'Function': sum_func, 'Name': f"{f1.name} + {f2.name}", \
            'p0': [1 for i in range(len(f1_params)+len(f2_params)+1)] }

Notice that the only argument which is shared by every function of the two lists is the independent variable x. Besides this one, every function has a varying number of parameters, which I don't want to specify by hand for each combination.

The 'p0' entry in the dictionary will be given to curve_fit as an initial guess, since otherwise it will be unable to determine the number of fit parameters.

An example of a call of curve_fit is, then (in the actual case, it would be in a for-loop, but this is enough here)

popt,_ = curve_fit(create_func(pdfs_start[2],pdfs_bulk[3])['Function'], bins[:-1] + (bins[1] - bins[0])/2, hist, p0 = create_func(pdfs_start[2],pdfs_bulk[3])['p0'])

This actually runs, but the resulting curve is exaggeratedly bad (I'm sorry that I won't be able to provide the histogram). I suppose this is a numerical issue and curve_fit simply isn't being able to converge with so many parameters and with the given initial guess, but am not sure.

Can what I intend be implemented? Besides that, do you have any tips to improve my code? I appreciate any insigths!

GaloisFan
  • 111
  • 5

0 Answers0