Testing Distributions when Sample size over 5000

Asked Jan 25 '21 at 15:08

Active Jan 25 '21 at 15:08

Viewed 34 times

I wanted to perform some distribution checks for my data so I tried using the shapiro.test(x) in the stats package in R. However, my data has more than 5000 observations and therefore triggers an error. Would there be no other way to test this other than visually?

asked Jan 25 '21 at 15:08

Jack Armstrong

1,182
4
26
59

1

I had to look up the shapiro test. I guess the assumption is that 5000 data points provides sufficient accuracy for the test. So I suppose you could randomly sample 5000 data points and test those or generate an order statistic and stratify sample across that data set. – SteveM Jan 25 '21 at 15:20
https://stackoverflow.com/questions/60058926/shapiro-wilks-test-is-not-working-in-r-markdown/60059227#60059227 suggests `ks.test` ... it would also be useful to read https://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-distributed-in-r/7788452#7788452 https://stackoverflow.com/questions/17125458/r-shapiro-test-cannot-deal-with-more-than-5000-data-points – Ben Bolker Jan 25 '21 at 16:06

Testing Distributions when Sample size over 5000

0 Answers0