I have the following data from 10 different people.
df = pd.DataFrame({'id':range(1, 11),
'x':[.7,-1.6,-.2,-1.2,-.1,3.4,3.7,.8,0,2]})
print(df)
id x
0 1 0.7
1 2 -1.6
2 3 -0.2
3 4 -1.2
4 5 -0.1
5 6 3.4
6 7 3.7
7 8 0.8
8 9 0.0
9 10 2.0
I want to calculate a 95 per cent confidence interval for the population mean of df[x]
.
Since the number of observations is small, the sample mean should follow the t distribution with 10 - 1 degrees of freedom. I tried the following in order to calculate a 95 per cent C.I. using scipy
:
# Libraries
import numpy as np
from scipy import stats
# Number of observations
n_obs = 10
# Observed mean
m_obs = df['x'].mean()
# Observed variance (unbiased)
v_obs = df['x'].var(ddof=1) / n_obs
# Declare random variable with observed parameters
t = stats.t(df=n_obs - 1, loc=m_obs, scale=np.sqrt(v_obs))
# Calculate 95% CI
t.interval(alpha=0.95)
> (-0.5297804134938646, 2.0297804134938646) ### Correct interval
This confidence interval is correct. However, I get a completely different result when I manually calculate the interval. What is causing this?
# T such that P(t < T) = 0.975
T = t.ppf(0.975)
# Manually compute interval
(m_obs - (T * np.sqrt(v_obs)), m_obs + (T * np.sqrt(v_obs)))
> (-0.3983168630668432, 1.8983168630668432) ### Incorrect interval