1

Possible Duplicate:
Seeing if data is normally distributed in R

I have 6 sets of residuals (fit - model) that I am testing for normality (I am trying to demonstrate that the deviation from the model is within instrumental noise).

The kernel density plots of all of them look approximately Gaussian, and the qqnorm plots look good. I have run all of them through two normality tests: shapiro.test {base} and ad.test {nortest}. These tests show that all the data sets are normal (p>>0.05, accept the null hypothesis of normality) except one. Usually I would not question these results, but the test that is coming back as 'not normal' (p<0.05, reject the null hypothesis of normality) is from the data set that looks MOST gaussian... I am confused, and would appreciate any help!

Here is the matrix of my residual kernel density plots, with the p-values from Anderson-Darling normality tests (ad.test) noted. All graphs are on the same scale (x & y). The non-normal peculiarity is the CvsD graph marked in red.

Here is a link to the data for the CvsD comparison.

Why aren't these residuals normal!?

Community
  • 1
  • 1
  • I don't think this is a duplicate, since he's asking about a specific problem that arises from those tests. As to normality - sorry to get all philosophical, but as great as normality tests are, computers are still nowhere near the power of the spatial analysis of the human brain. Use `cuts` and `plot(CD_resids)` to see where skews might exist, and compare that with `rnorm` randomly generating a sample size of 328 a bunch of times. – Señor O Nov 09 '12 at 15:28
  • Is it possible that the _n_ of the CvsD comparison is a lot higher than for the other sets? If so, the lower p value may just be an artefact of that, cf. @DWin's comment. – Stephan Kolassa Nov 09 '12 at 19:55
  • For test for normality you should use Shapiro Wilk Test 'shapiro.test(x)' in R. – Ole Petersen Nov 24 '15 at 13:32

1 Answers1

1

In fact, this does not look very Gaussian to me; more like t distribution with a large n -- it is much more "spiky" than a normal curve. Both ad.test and the shapiro test return p < 0.05 (shapiro.test on your data returns p = 0.002655).

However, note that the usefulness of normality tests is contested; see for example this question. Basically, for large sample sizes even small deviations from a normal distribution are penalized and H0 is rejected.

That said, I still believe -- given that you only have 328 sample size -- that in your case the distribution is not really normal.

Community
  • 1
  • 1
January
  • 16,320
  • 6
  • 52
  • 74
  • 2
    A t distribution with a large n is virtually indistinguishable from a normal distribution, especially just by sight. And whether it's more or less "spiky" only depends on the variance of the normal distribution. – Fojtasek Nov 09 '12 at 15:20