3

I would like to estimate the mean of a set of data I have.

I have 1000 data points, and I read somewhere that if your sample size is less than 30, you should use a t score, else use a z score.

Here is the code I use

def mean_confidence_interval(data,confidence = 0.95):

    from numpy import mean,array
    import scipy as sp
    import scipy.stats

    a = array(data)

    n = len(a)
    m, se = mean(a), scipy.stats.sem(a)
    h = se*sp.stats.t._ppf( (1+confidence)/2., n-1)

    return m, h, (m-h,m+h)

I'm wondering which function I can use insteaf of sp.stats.t._ppf to calculate the proper z score.

user3600497
  • 1,621
  • 1
  • 18
  • 22

1 Answers1

2

You use a z-score/test when the population standard deviation is known, and a t-score/test when it is estimated from the data. For large samples (~>30), they become the same thing. So in your case, I would just use your t-score confidence intervals for everything.

Sealander
  • 3,467
  • 4
  • 19
  • 19