0

I have two small data sets:

infected.data.r.p <- structure(list(MLH = c(0.520408163265306, 0.436170212765957, 
0.344086021505376, 0.423076923076923, 0.406976744186047), ColGrowthCL_6 = c(5.923728814, 
0.283950617, 0.377358491, 1.728070175, 0.2)), .Names = c("MLH", 
"ColGrowthCL_6"), row.names = c("12", "22", "28", "30", "34"), class = "data.frame")

and

uninfected.sampling <- structure(list(MLH = c(0.524271844660194, 0.457446808510638, 
0.354838709677419, 0.398058252427184, 0.436893203883495), ColGrowthCL_6 = c(4.401639344, 
4.827586207, 6.387096774, 6.320754717, 4.225490196)), .Names = c("MLH", 
"ColGrowthCL_6"), row.names = c("218", "18", "21", "212", "99"
), class = "data.frame")

When I try to compare these two models using the anova() syntax in R (see below), it fails to produce a p-value. I'm not convinced that it is the nature of the two data sets that's causing the problem (although I'm also curious what exactly is different between the structure of the two data sets), but I suppose it very well could be the problem. Thank you!

Model comparison syntax:

infected.model<-glm(formula=as.formula(ColGrowthCL_6~MLH), family=poisson, infected.data.r.p)
uninfected.model<-glm(formula=as.formula(ColGrowthCL_6~MLH), family=poisson, uninfected.sampling)    

compare<-anova(infected.model,uninfected.model,test="Chisq")
print(compare)
summary(compare)
merv
  • 67,214
  • 13
  • 180
  • 245
Atticus29
  • 4,190
  • 18
  • 47
  • 84
  • To clarify, the second data set was drawn randomly from a larger data set so that it would have the same sample size as the first data set. I will later repeat this process many times, but I wanted to troubleshoot this pilot run first. – Atticus29 Aug 24 '13 at 22:00
  • 2
    Statistically what you're trying to do doesn't really make sense. At least if you're trying to compare via anova. – Dason Aug 24 '13 at 23:06
  • Point taken (see below). Any advice if not anova? I guess another way of thinking about this is that I want to know whether five samples randomly drawn from the larger data set would result in the types of regression coefficient I see in the smaller data set (more than 5% of the time). I guess I could find this out empirically with a permutation test... – Atticus29 Aug 24 '13 at 23:16
  • 1
    What exactly do you want to do? The question you're trying to answer is not clear to me. – Dason Aug 24 '13 at 23:43
  • I really want to put the following as an answer but I'll refrain and just post it as a comment instead: To answer the question of "what does it mean when the output of anova doesn't produce a p-value" (even if you specify a test)- It means you're asking for something that doesn't make sense – Dason Aug 24 '13 at 23:59
  • 2
    I hope you don't mind - I removed a bunch of useless stuff from your data that was filling the screen... – Dason Aug 25 '13 at 00:06

1 Answers1

3

I believe you can only compare models that come from the same data set. So when comparing two (nested) models with Chi Squared, it needs to be from the same dataset - perhaps that is why your p-values aren't being calculated.

phg
  • 536
  • 1
  • 7
  • 19
  • Ah of course. Good point! In that case, do you happen to be aware of a way I can compare these models? They are, ultimately, drawn from the same data set, but will never overlap in terms of samples... – Atticus29 Aug 24 '13 at 23:12
  • 1
    I'm not aware of any way of doing what you want to do. You could take your original data set and split into two factors (infected & uninfected) with two levels? I'm also not convinced the Poisson link is appropriate here as it's typically used for count data and your response aren't. You could try bootstrapping your models to give you 95% CIs of the coefficients and see if they overlap but is not statistically sound. – phg Aug 24 '13 at 23:37
  • Thanks again, @hgeop! I've actually been struggling with the Poisson link function as well. I chose Poisson because some of the lower values are very common, whereas the higher ones are not. The distribution is neither significantly normal nor Poisson, and I'm not sure what to do about that... – Atticus29 Aug 24 '13 at 23:40
  • 1
    Well a poisson doens't make *any* sense because it literally only takes values on integers. – Dason Aug 24 '13 at 23:41
  • Look, dude, I'm flyin' blind here. These stat books are super confusing. I chose glms specifically because they accommodate non-normal distributions. Any advice on what to do if it doesn't seem that my distribution is any of the other distributions? – Atticus29 Aug 25 '13 at 00:04
  • Well you never answered my question about what your research question actually is. Also it doesn't seem like you have a lot of data - how do you know a normal distribution doesn't work? Note that the normal distribution assumption is on the error terms in the linear model and not on the response variable itself. Also - it's entirely clear that this is more of a stats question and not a programming question at all anymore. – Dason Aug 25 '13 at 00:40