I am analysing data for an observational study on trout growth. So there are plenty of NAs in the dataframe. Also, different treatments have different numbers of observations. So I think this can be called an unbalanced design?
Year Site FishID Size Cover AGE L1
2010 LT1 10_LT1 _ 11 Large Heavy 2 5.88
2010 LT3 10_LT3 _ 14 Large Heavy 2 5.228571429
2010 SO4 10_SO4 _ 8 Small Open 0 NA
2010 SO5 10_SO5 _ 22 Small Open 0 NA
2011 LT1 11_LT1 _ 14 Large Heavy 1 6.44
2011 LT1 11_LT1 _ 15 Large Heavy 1 6.25
2011 LT1 11_LT1 _ 16 Large Heavy 1 6.421052632
2011 LT1 11_LT1 _ 18 Large Heavy 1 7.74
2011 SO5 11_SO5 _ 6 Small Open 1 7.7625
2011 SO5 11_SO5 _ 8 Small Open 1 6.914285714
2011 SO5 11_SO5 _ 13 Small Open 1 6.5
2011 SO5 11_SO5 _ 16 Small Open 1 7.2
2011 ST1 11_ST1 _ 21 Small Heavy 0 NA
2011 ST2 11_ST2 _ 10 Small Heavy 0 NA
2011 ST2 11_ST2 _ 5 Small Heavy 0 NA
2011 ST3 11_ST3 _ 20 Small Heavy 0 NA
2011 ST4 11_ST4 _ 5 Small Heavy 0 NA
2011 ST1 11_ST1 _ 9 Small Heavy 1 7.521428571
2011 ST1 11_ST1 _ 17 Small Heavy 1 8.169230769
2011 ST1 11_ST1 _ 20 Small Heavy 1 7.03125
My fixed effects are stream size: 2 levels stream cover: 2 levels, Year: 2 levels and age: 3 levels.
I also ramdomly chose a bunch of sites to examine stream size and stream cover effects. So I presume site is classed as a random effect?
L1
is a proxy for growth rate.
At this stage my maximal model looks like this:
m1=lme(L1~Year*Age*Size*Cover, random=~1|Site ,data=Trout_Growth,method="ML",na.action=na.exclude)
qqnorm(residuals(m1))
qqline(residuals(m1)) # Normality OK
plot(density(residuals(m1),na.rm=T)) ***## heteroscedasticity present (lme should handle this??) ##***
summary(m1)
anova(m1,test=T,type="marginal") ***## marginal used because of unbalanced design?? ##***
I have of course had a good look at this website which is great but I am still unsure of correct notation for inclusion of the random factor Site. (lme and lmer seem to use different notation. Is this correct?)
I Suppose I am asking if this is the correct starting point to refine my model. I am starting to really enjoy R but without regular thumbs up regarding the R code, the temptation is to slip back to comfy windows based stats software.
Any opinions or advice would be greatly appreciated.
Diarm