0

If I test x against a Poisson hypothesis, then I use mean(x) as lambda to calculate p so df = k - 2; if against a Normal hypothesis, then I use mean(x) and var(x) to calculate p so df = k - 3. How can R return a chisq value without knowing the df lost by parameter estimated to get p?

#say I have some data that I want to test against Poisson
data = c(0, 0, 0, 1, 0, 1, 2, 2); 
lambda = mean(data); #0.75
bins = c(0, 1, 2, 3); #bins for grouping data
x = c(4, 2, 2, 0); #number of observations for bins
p = dpois(bins, lambda);
chisq.test(x, p=p, rescale.p=TRUE)
#the df should be number of bins - 1 - number of estimates, so 2, but R gave df = 3, ignoring (not knowing) the 1df lost in lambda.

I group the original observations into a frequency vector (x) based on bins and assign null-hypothesis probabilities to a vector (p) based on bins using original data to estimate unknown null-hypothesis distribution parameters. Then call chisq.test(x, p=p, rescale.p=TRUE) to test x against some distribution assumption. Is it the right way to do such a test?

limestreetlab
  • 173
  • 1
  • 11
  • I think you are calling the function in a wrong way. You need (1) use `p=` option to give probabilities, (2) add "4+" class so that the probabilities add to one. This link can help: https://stats.stackexchange.com/questions/92627/how-to-use-the-chi-squared-test-to-determine-if-data-follow-the-poisson-distribu. That said, I don't think `chisq.test` take into account the fact that the probability is calculated with a parameter estimated (`lambda`). So I think this function is not appropriate for your case. – Kota Mori Jul 26 '20 at 16:01
  • Yes, the function seems to use k (number of bins) - 1, but if you test some data against some distribution assumption, it is very unlikely you can specify parameters, so df will be lost and number lost based on the distribution. I think the function should take in a df parameter to make it more flexible. – limestreetlab Jul 26 '20 at 16:07

0 Answers0