Calculate standard deviation using R

Question

Using the R script solve the following: An expert on process control states that he is 95% confident that the new production process will save between $26 and $38 per unit with savings values around $32 more likely.
If you were to model this expert's opinion using a normal distribution (by applying empirical rule), what standard deviation would you use for your normal distribution? (round your answer to 1 decimal place.

How much of a normal distribution is contained within +/- 1 sigma (standard deviation) of mu (mean)? That is, how much of the data is between `mu - sigma` and `mu + sigma`? How much (%) between two standard deviations from the mean? (If this is a stats course, then I'm confident you have the plot of a normal distribution, perhaps one that has 1, 2, and maybe 3 deviations indicating area-within.) — r2evans, Sep 17 '20 at 03:22

David W. Thrower · Accepted Answer · 2020-09-17T05:27:06.803

It appears that whoever wrote this problem is confused and doesn't know if they are asking a sample mean confidence interval problem "95% confident" or a simple population normal distribution problem.

Let's try to rationalize out how we can or can't solve this , and we will discover some problems with the wording of this problem as we go:

He says he is 95% confident that ... [ignore everything else for now and assume that this is a confidence interval of the mean problem ... and we will see why that is wrong] ... First let's calculate the z score on the normal distribution table which corresponds with a .95 % cumulative probability. You said you want to do this in R, so use qnorm()

> qnorm(.95)

[1] 1.644854

Now we knew that the money saved is between: $26 and $38. $26 we now know is 1.644854 standard errors below the sample mean if his 95% confidence implies that this is a sample mean and $38 we now know is 1.644854 standard errors above the estimated mean from their sample mean ... (if this were a confidence interval problem). Their sample mean is presumably $32.

let's say we try to solve for the st dev. The standard error is:

StDev / sqrt(sample size) and the confidence interval is:.

lower bound: 32 - 1.644854 * StDev / sqrt(sample size);

upper bound: 32 + 1.644854 * StDev / sqrt(sample size) # we will use this below

We could attempt to solve algebraically for StDev by putting the upper bound formula on the left side of the = sign and put 38 which is the upper bound on the right side:

32 + (1.644854 * StDev / sqrt(sample size)) = $38 ... Now solve for StDev

StDev = (sqrt(sample size)* (38-32))/1.644854) ... If I didn't screw up my mental algebra at midnight without paper in hand...

There's a problem here that this rhetorical exercise was meant to point out: We still have 2 variables. The problem you posted simply didn't tell us enough information to solve this with the assumption this was a confidence interval from a sample. We are kind of out of luck if this is where they were going with this.

It looks like the 95% confidence clause (and absence of a mention of a sample mean) is meant to throw you off, but in reality, it just makes the person asking you this question appear to be confused as to what question they are asking you.

If you re-frame the question and assume that 1. the 95% confident clause is junk information 2. We are talking about individual probabilities that a given observation falls at or below a given value, not that we are 95% confident that the average observation does, and 3. That whoever wrote the question does not understand the proper usage of the phrase 95% confident or was exhausted when they wrote it ... or you mis-transcribed the problem ... Then the question should be would be worded like this: "We know that in 95% of all cases saved no more than $38 and 5% of customers saved $26 or less." In this case we could drop the standard error term altogether and we would then only be worried about the standard deviation and mean of the population:

The population mean then is 32

the mean + 1.644854 standard deviations is 38 (95% of customers save no more than this)

38 - 32 = 6 (this is equal to 1.644854 StDev): Algebraically that's written:

6 = 1.644854 * stdev

divide both sides by 1.644854:

6/1.644854 = StDev

StDev = 3.64774

Let's verify that we got this right:

> qnorm(.95,mean=32,sd=3.64774) # Verify in R that the stdev we calculated is correct: .95 cumulative probability, mean is 32 and we assert that the StDev is 3.64774. We got:

[1] 38

$38 or less is what 95% would get. This seems right.

> qnorm(.05,mean=32,sd=3.64774)

[1] 26

26 or less is what the 5 % of customers that saved the least got. This seems right also.

Summary:

The question you posted doesn't make any sense. It is either incomplete,mis-transcribed, or whoever wrote it seems to be a little confused.
If you ignore the 95% confidence clause and reframe the question to make a guess to compensate for how ambiguous it was, then the answer is: The standard deviation is 3.6.

Thank you David. This is a simple normal distribution problem — N341, Sep 18 '20 at 01:59

score 1 · Answer 2 · answered Nov 09 '21 at 02:26

According to Empirical rule of normal distribution:

68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the mean.

As the data says 95% confident, data will fall within 2 standard deviation.

So, min_value: 26=mean-2standard_deviation or, max_value: 38=mean+2standard_deviation

Given, mean=32, Solving either of above two equations, standard_deviation=3.0

N341 · Answer 3 · 2020-09-23T01:08:39.067

I used this code for lower limit =2 and upper limit =3, and it worked correctly, for lower values of the limits but it doesn't work for larger numbers unless I add 0.5 to sd

> f <- function(lwr, upr){
>   c("mean"= (upr+lwr)/2, 
>     "stddev" =  (upr-lwr)/4, 
>      "sdRound" =round((upr-lwr)/4,1)) } 
> f(2,3)

With this, I get the answers as:

mean stddev sdRound
2.50 0.25 0.20

I can't use the value rounded in R. The correct answer is 0.3 since 0.25 when rounded is 0.3. When I plug this sd=0.3 in the below, I get the correct upperlimit (and also lower limit)

> upperlimit = round(qnorm(0.95, mean=2.5, sd=0.3),0) 
> lowerlimit = round(qnorm(0.05, mean=2.5, sd=0.3))

upperlimit =3 lowerlimit =2

This also works for f(6,9)

Calculate standard deviation using R

3 Answers3