0

Any thoughts on why the distribution created by np.random.pareto based on the alpha manually calculated in this code is slightly different from original distribution?

Could this mean a different distribution is more effective?

plt.hist(data['sdev'], bins=50, density=True, alpha=0.6, color='b')
plt.xlabel("retn_abs(stdevs - mean)")
plt.ylabel("density")
plt.show()

# calc alpha
data_len = len(data['sdev']) 
x_min = np.min(data['sdev'])
x_max = np.max(data['sdev'])

data['sdev'] = np.sort(data['sdev'])
divide = data['sdev'] / x_min
data['alpha_calc'] = np.log(divide)
alpha = (np.sum(data['alpha_calc']) / data_len) ** -1
alpha = alpha
error = (alpha - 1) / (data_len) ** (1/2)
eighty = .2 ** ((alpha - 2) / (alpha - 1))
print("alpha", alpha)
print("error", error)
print(eighty)
#alpha = 2.75

# create test dist
out = np.random.pareto(alpha, data_len)
out = out + x_min
cust_min = np.min(out)
cust_max = np.max(out)
cust_max = 13
print("cust_max", cust_max)
plt.hist(out, align = 'right', bins = 50, range = (cust_min, cust_max), density=True, alpha=0.6, color='b')
#plt.hist(out, bins=50, density=True, alpha=0.6, color='b')
plt.show()
jdhower33
  • 1
  • 1
  • How many elements are there? A random distribution is always going to be quite variable in the short term. – Tim Roberts May 25 '22 at 20:10
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community May 26 '22 at 17:23
  • by elements do you mean observations? difference persists even for large volume data sets (i.e. 5,000 observations) – jdhower33 May 31 '22 at 21:07

0 Answers0