0

Problem statement - Variable X has a mean of 15 and a standard deviation of 2.

What is the minimum percentage of X values that lie between 8 and 17?

I know about 68-95-99.7 empirical rule. From Google I found that percentage of values within 1.5 standard deviations is 86.64%. My code so far:

import scipy.stats
import numpy as np
X=np.random.normal(15,2)

As I understood,

13-17 is within 1 standard deviation having 68% values.

9-21 will be 3 standard deviations having 99.7% values.

7-23 is 4 standard deviations. So 8 is 3.5 standard deviations below the mean.

How to find the percentage of values from 8 to 17?

MVKXXX
  • 193
  • 1
  • 2
  • 11
  • I will be highly obliged if someone kindly replies. – MVKXXX Feb 01 '21 at 05:41
  • Take a look at https://en.wikipedia.org/wiki/Probability_density_function – maria Feb 01 '21 at 12:40
  • How about adding a third parameter to your `X=np.random.normal(15,2)` which corresponds to a large number of samples, then count the ones above and below the mean +/- a number of standard deviations. – Mark Setchell Feb 01 '21 at 13:37

1 Answers1

3

You basically want to know the area under the Probability Density Function (PDF) from x1=8 to x2=17.

You know that the area of PDF is the integral, so it is Cumulative Density Function (CDF).

Thus, to find the area between two specific values of x you need to integrate the PDF between these values, which is equivalent to do CDF[x2] - CDF[x1].

So, in python, we could do

import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt

mu = 15
sd = 2
# define the distribution
dist = sps.norm(loc=mu, scale=sd)
x = np.linspace(dist.ppf(.00001), dist.ppf(.99999))
# Probability Density Function
pdf = dist.pdf(x)
# Cumulative Density Function
cdf = dist.cdf(x)

and plot to take a look

fig, axs = plt.subplots(1, 2, figsize=(12, 5))

axs[0].plot(x, pdf, color='k')
axs[0].fill_between(
    x[(x>=8)&(x<=17)],
    pdf[(x>=8)&(x<=17)],
    alpha=.25
)
axs[0].set(
    title='PDF'
)

axs[1].plot(x, cdf)
axs[1].axhline(dist.cdf(8), color='r', ls='--')
axs[1].axhline(dist.cdf(17), color='r', ls='--')
axs[1].set(
    title='CDF'
)
plt.show()

enter image description here

So, the value we want is that area, that we can calculate as

cdf_at_8 = dist.cdf(8)

cdf_at_17 = dist.cdf(17)

cdf_between_8_17 = cdf_at_17 - cdf_at_8

print(f"{cdf_between_8_17:.1%}")

that gives 84.1%.

Max Pierini
  • 2,027
  • 11
  • 17