-3

I have a data set. It's biological material. I have put in my standard deviations and I can see that all of my data bar 2 data points are within 3sd of the mean. Is it accepted that data points that fall within 3sd of the mean are within normal variation? Or is the dependant on the range and dispersement of the data? I'm not a mathematician. Just somebody trying to work out if I have a process in control. I have always understood 3sd to represent 95% of data and therefore data inside this is within normal distribution and not worth investigating. However I am often asked to investigate data that is well within 2sd based on how the chart looks!. example chart

When should one be investigating data as abnormal when using standard deviations?

Many thanks in advance for any help

Creaven
  • 319
  • 2
  • 16

1 Answers1

1

You should take a look at the 68–95–99.7 rule.

About 95% (95.45%) of your data will fall within two standard deviations from the mean, if your data follows a normal distribution. If the data follows another distribution, you can say by Chebyshev's inequality that at least 75% of the data necessarily will fall within two standard deviations. Assuming a normal distribution, about 99.7% (99.73%) of the data will fall within three standard deviations of the mean. If not a normal distribution, at least 89% (88.8888%) will fall there.

Note that even if your data follows a normal distribution, chance (sampling error) will make it so that those percentages are not exactly the case.

So the numbers do depend on your data, especially the kind of distribution of the data and the number of data points. If you have 1000 data points, you still will get about 3 points outside the 3 standard deviations.

Rory Daulton
  • 21,934
  • 6
  • 42
  • 50
  • I have read your links. My understanding from reading this is that if my data set follows normal distribution (I believe the material I am looking at will) then it is expected that my 99.7% of my data should be expected to fall within 3SDs. With sample size/error the 0.3% will play more of a part and I should expect to find outliers. We used ST DEV to monitor a shift in a process step or operation and what I am understanding is that if it shift is within the bounds of 3SD I need not be overly concerned, especially if it comes back down. – Creaven Aug 12 '17 at 19:25
  • +1 for the careful distinction between normal distributions and other distributions (and especially for making the connection to Chebyshev's inequality). I could go off every time I read the `3 sigma rule` interpreted as applying to every distribution. – Stefan Zobel Aug 12 '17 at 20:01