0

I want to fit a truncated normal distribution with scipy. I know the lower and upper thresholds, and I have empirical data. I have that my data is in the range of [np.log(2.5e6), np.inf]. I know that with scipy.stats.truncnorm.fit(data, fb=np.inf) I can give the correct upper bound during the optimization. I also know that the correct argument for fa would be fa=(np.log(2.5e6)-mu)/sigma, but the problem is, that I do not know sigma and mu of the underlying normal distribution. How can I fit those? If I don't provide a, it will give me wrong results. Estimating a with a = (np.log(2.5e6)-np.mean(data)/np.std(data) also gives me a terrible result. If I simulate my problem and give the correct a as an guess, everythings works fine...

I also tried something like this, but I only got Nan values.

#Script adapted from here: #Fitting data using scipy truncnorm

def func(p, r, xa):
    a, loc, scale = p
    return scipy.stats.truncnorm.nnlf([a, np.inf, loc, scale], r)
    #return truncnorm.nnlf(p, r)


def constraint(p, r, xa):
    a, loc, scale = p
    return np.array([a*scale + loc - xa])



xa = np.log(2.5e6)
mu = 14,71
sigma = 1.14
a = (xa-mu)/sigma
log_data = scipy.stats.truncnorm(a=a, b=np.inf, loc=loc, scale=sigma).rvs(10**6)
loc_guess = np.mean(log_data)
scale_guess = np.std(log_data)
a_guess = (xa - loc_guess)/scale_guess
p0 = [a_guess, loc_guess, scale_guess]

#r = log_data

par = scipy.optimize.fmin_slsqp(func, p0, f_eqcons=constraint, args=(log_data, xa),
                 iprint=False, iter=1000)

print(par)

EDIT I also tried this: (First simulate, then once fit without providing the lower normalized truncation point, the second time with the lower truncation point). I don't know how I can give the lower truncation point if I only know the truncation point in absolute value, (i.e. log_data >= 2.5e6) but not normalized with respect to mu and sigma..

import scipy
import numpy as np

THRESHOLD = np.log(2.5e6)
mu = 14.71
sigma = 1.14
A_CORRECT = (THRESHOLD-mu)/sigma
#BLN = prisma_bed_lognorm(mu, sigma)
BLN = scipy.stats.truncnorm(loc=mu, scale=sigma, a=A_CORRECT, b=np.inf)
log_data = BLN.rvs(10**6)
print('theoretical norm', BLN.mean(), BLN.std())
print('emp norm stats', np.mean(log_data), np.std(log_data))
print('LN emp stats', np.mean(data), np.std(data))
#A_GUESS = (THRESHOLD-np.mean(log_data))/np.std(log_data)
#print(A_CORRECT, A_GUESS)
x = scipy.stats.truncnorm.fit(log_data, fb=np.inf, method='MLE')
print("a", "b", "loc", "std")
print(x)
print(scipy.stats.truncnorm.fit(log_data, fb=np.inf, fa=A_CORRECT, method="MLE"))
#Output
a b loc std
(4.332558230073726, inf, -0.26590264062021884, 3.4612803954831683)
(0.019123938454760524, inf, 14.710001292351723, 1.1399781227638002)
Massimo
  • 1
  • 1
  • Yes, i did use scipy.stats.truncnorm.fit(data, fb=np.inf). The problem here is, that I can't tell the optimizer that the threshold itself is fixed, as scipy uses the threshold in a normalized way, i.e. a= (threshold-mu)/sigma. Thus I am not able to give this extra information to the optimizer. When feeding the correct lower truncation point to the optimizer, it will work like a charm. – Massimo Jul 24 '23 at 13:19
  • Can you provide example data, or at least a plot of your random variable? – Reinderien Jul 24 '23 at 13:24
  • I updated the original post for an example. Again, my problem is, that I know the threshold, but only in absolute terms, and not in the way scipy defines it. For my real problem I have data from an unknown truncated norm distribution, where I want to estimate mu and sigma (and thus also (Threshold-mu)/sigma. – Massimo Jul 24 '23 at 14:09

0 Answers0