A homework question asked me to calculate a confidence interval for a mean. When I did it the traditional method and with numpy.percentile() -- I got different answers.
I think that I may be misunderstanding how or when to use np.percentile(). My two questions are: 1. Am I using it wrong -- wrong inputs, etc. 2. Am I using it in the wrong place - should use for bootstrap CIs and not conventional methods?
I've calculated the CI by the traditional formula and np.percentile()
price = np.random.normal(11427, 5845, 30)
# u = mean of orginal vector
# s = std of original vector
print(price)
[14209.99205723 7793.06283131 10403.87407888 10910.59681669 14427.87437741 4426.8122023 13890.22030853 5652.39284669 22436.9686157 9591.28194843 15543.24262609 11951.15170839 16242.64433138 3673.40741792 18962.90840397 11320.92073514 12984.61905211 8716.97883291 15539.80873528 19324.24734807 12507.9268783 11226.36772026 8869.27092532 9117.52393498 11786.21064418 11273.61893921 17093.20022578 10163.75037277 13962.10004709 17094.70579814]
x_bar = np.mean(price) # mean of vector
s = np.std(price) # std of vector
n = len(price) # number of obs
z = 1.96 # for a 95% CI
lower = x_bar - (z * (s/math.sqrt(n)))
upper = x_bar + (z * (s/math.sqrt(n)))
med = np.median(price)
print(lower, med, upper)
10838.458908888499 11868.68117628698 13901.386475143861
np.percentile(price, [2.5, 50, 97.5])
[ 4219.6258866 11868.68117629 20180.24569667]
ss.scoreatpercentile(price, [2.5, 50, 97.5])
[ 4219.6258866 11868.68117629 20180.24569667]
I would expect the lower, med and upper to equal the output of np.percentile().
While the median values are the same -- the upper and lower are quite a bit off of each other.
Moreover, scipy.stats.percentile gives the same output as numpy.percentile.
Any thoughts?
Thanks!
Edited to show the price vector.