0

I am running a CCA of some ecological data with ~50 sites and several hundred species. I know that you have to be careful when your number of explanatory variables approaches your number of samples. I have 23 explanatory variables, so this isn't a problem for me, but I have also heard that using too many explanatory variables can start to "un-constrain" the CCA.

Are there any guidelines about how many explanatory variables is appropriate? So far, I have just plotted them all and then removed the ones that appear to be redundant (leaving me with 8). Can I use the intertia values to help inform/justify this?

Thanks

setbackademic
  • 143
  • 3
  • 11

1 Answers1

2

This is the same question as asking "how many variables are too many for regression analysis?". Not "almost the same", but exactly the same: CCA is an ordination of fitted values of linear regression. In most severe cases you can over-fit. In CCA this is evident when the first eigenvalues of CCA and (unconstrained) CA are almost identical and the ordinations look similar in first dimensions (you can use Procrustes analysis to check this). Extreme case would be that residual variation disappears, but in ordination you focus on first dimensions, and there the constraints can get lost much earlier than in later constrained axes or in residuals. More importantly: you must see CCA as a kind of regression analysis and have the same attitude to constraints as to explanatory (independent) variables in regression. If you have no prior hypothesis to study, you have all the problems of model selection of regression analysis plus the problems of multivariate ordination, but these are non-technical problems that should be handled somewhere else than in stackoverflow.

Jari Oksanen
  • 3,287
  • 1
  • 11
  • 15
  • 1
    A small comment on sample sizes: Your sample size is ~50 observations -- the number of species does not matter. Would you fit a regression with 23 explanatory variables to a sample of 50 observations? Hardly. Would you fit a regression with 8 explanatory variables to 50 points. I don't know, but you shouldn't. Think in these terms when you consider the number of constraints you need. – Jari Oksanen Jan 12 '17 at 08:15