76

How does one convert a Z-score from the Z-distribution (standard normal distribution, Gaussian distribution) to a p-value? I have yet to find the magical function in Scipy's stats module to do this, but one must be there.

gotgenes
  • 38,661
  • 28
  • 100
  • 128

7 Answers7

72

I like the survival function (upper tail probability) of the normal distribution a bit better, because the function name is more informative:

p_values = scipy.stats.norm.sf(abs(z_scores)) #one-sided

p_values = scipy.stats.norm.sf(abs(z_scores))*2 #twosided

normal distribution "norm" is one of around 90 distributions in scipy.stats

norm.sf also calls the corresponding function in scipy.special as in gotgenes example

small advantage of survival function, sf: numerical precision should better for quantiles close to 1 than using the cdf

Derek Farren
  • 127
  • 1
  • 6
Josef
  • 21,998
  • 3
  • 54
  • 67
45

I think the cumulative distribution function (cdf) is preferred to the survivor function. The survivor function is defined as 1-cdf, and may communicate improperly the assumptions the language model uses for directional percentiles. Also, the percentage point function (ppf) is the inverse of the cdf, which is very convenient.

>>> import scipy.stats as st
>>> st.norm.ppf(.95)
1.6448536269514722
>>> st.norm.cdf(1.64)
0.94949741652589625

Edit: A user requested an example for ''vectors'':

import numpy as np
vector = np.array([.925, .95, .975, .99])
p_values = [st.norm.ppf(v) for v in vector]
f_values = [st.norm.cdf(p) for p in p_values]

for p,f in zip(p_values, f_values):
 print(f'p: {p}, \tf: {f}')   

Yields:

p: 1.4395314709384563,  f: 0.925
p: 1.6448536269514722,  f: 0.95
p: 1.959963984540054,   f: 0.975
p: 2.3263478740408408,  f: 0.99
Myles Baker
  • 3,600
  • 2
  • 19
  • 25
12

Aha! I found it: scipy.special.ndtr! This also appears to be under scipy.stats.stats.zprob as well (which is just a pointer to ndtr).

Specifically, given a one-dimensional numpy.array instance z_scores, one can obtain the p-values as

p_values = 1 - scipy.special.ndtr(z_scores)

or alternatively

p_values = scipy.special.ndtr(-z_scores)
gotgenes
  • 38,661
  • 28
  • 100
  • 128
  • Strange terminology, "Z-distribution" instead of "Normal curve". Z-score I'd probably call standard deviation in this context as well. – Nick T Aug 16 '10 at 19:52
  • Well, the Z-distribution == "standard normal distribution" == `N(0, 1)`. That said, your point is well taken. I have updated the question to reflect the various terminology for the same concepts. – gotgenes Aug 16 '10 at 20:43
8

Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.

It can be used to apply the inverse cumulative distribution function (inv_cdf, also known as the quantile function or the percent-point function) and the cumulative distribution function (cdf):

NormalDist().inv_cdf(0.95)
# 1.6448536269514715
NormalDist().cdf(1.64)
# 0.9494974165258963
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
3

From formula:

import numpy as np
import scipy.special as scsp
def z2p(z):
    """From z-score return p-value."""
    return 0.5 * (1 + scsp.erf(z / np.sqrt(2)))
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
1
p_value = scipy.stats.norm.pdf(abs(z_score_max)) #one-sided test 
p_value = scipy.stats.norm.pdf(abs(z_score_max))*2 # two - sided test

The probability density function (pdf) function in python yields values p-values that are drawn from a z-score table in a intro/AP stats book.

HK boy
  • 1,398
  • 11
  • 17
  • 25
0

For Scipy lovers, Tough this is old question but relevant, and we can have not only normal but other distributions as well so here is solution for few more distributions:

def get_p_value_normal(z_score: float) -> float:
    """get p value for normal(Gaussian) distribution 

    Args:
        z_score (float): z score

    Returns:
        float: p value
    """
    return round(norm.sf(z_score), decimal_limit)


def get_p_value_t(z_score: float) -> float:
    """get p value for t distribution 

    Args:
        z_score (float): z score

    Returns:
        float: p value
    """
    return round(t.sf(z_score), decimal_limit)


def get_p_value_chi2(z_score: float) -> float:
    """get p value for chi2 distribution 

    Args:
        z_score (float): z score

    Returns:
        float: p value
    """
    return round(chi2.ppf(z_score, df), decimal_limit)