What is the difference between R vegan species-scores, and Primer Spearman rank correlation of species to axis?

Question

In constrained ordination analysis, like CAP or dbRDA, researchers often want to know how much of the dissimilarity is attributed to specific species. In Primer PERMANOVA, Spearman rank or Pearson correlations of species to the axis is an option to that provides an estimate of the species that characterise the variation between assemblages of species when using CAP or RDA. In R, vegan provides a different measurement, known as species scores, which can be calculated but not without careful consideration of the potential shortcomings https://github.com/vegandevs/vegan/issues/254#issuecomment-334071917. and vegan dbrda species scores are empty despite community matrix provided, when using capscale.

I would like to better understand how the correlations and species scores are calculated in Primer PERMANOVA. Firstly, is there a real difference in what these methods aim to show? What are the benefits and shortcomings of the using Spearman or Pearson correlations over R- vegan's species scores? Does the method of calculating the species-to-axis correlations in Primer suffer from similar problems as detailed in the above links for species scores in capscale or dbrda in R? In Primer, what are the variables used from the community matrix and axis to calculate the correlations between them? Are these raw or the transformed data? And finally, if the correlation method is a better estimate of the relative amount by which species cause differences between assemblages than species scores in R, should this be considered as an alternative to R vegan' species scores?

**if** you don't get an answer here after a reasonable amount of time, you might try posting (with a link to this question) to `r-sig-ecology@r-project.org` ... — Ben Bolker, Oct 04 '17 at 11:20
I'd like to comment, but writing the response takes a long time... This needs a long comment. — Jari Oksanen, Oct 04 '17 at 16:35
From recent feedback on https://github.com/vegandevs/vegan/issues/254 I understand there are some potential pitfalls and serious complications regarding the use of species-axis correlations - hence the reason why this is not used in R `vegan`. It sounds as if rank order correlations will suffer from the same complications imposed by non Euclidian distance in ordination space, as detailed https://stackoverflow.com/questions/46531969/vegan-dbrda-species-scores-are-empty-despite-community-matrix-provided for `dbrda` vegan species scores. Looking forward to PRIMER users feedback on correlations. — Philip Haupt, Oct 05 '17 at 09:09
Thanks Ben and Jari - I will wait for a few days, then repost the question to r-sig-ecology@r-project.org if I receive no responses from PRIMER (correlation based methods) users. I think Jari has provided clear reasons why this is a complicated issue in his post on Github. — Philip Haupt, Oct 05 '17 at 09:12

score 2 · Accepted Answer · answered Oct 05 '17 at 08:01

I have never seen PRIMER and I cannot comment on differences between vegan and PRIMER, but I can explain how we work in vegan.

If you think about species scores of fitted environmental variables as arrows, there are two separate aspects: direction and length. First about direction of the arrow.

In general, the arrows are not parallel to the axes, but they point to the direction to which the species or the environmental variable changes most rapidly. The directions of arrows can be found from linear model lm(y ~ Ax1 + Ax2). If y is a species, this gives the arrow for the species score, and if y is an environmental variable, this gives a fitted vector. Correlating species with axes implies two separate models lm(y ~ Ax1) and lm(y ~ Ax2). The vegan model defines a linear trend surface, and the axis model defines two separate linear trend surfaces with each having steepest gradient along one axis. The following example shows how the linear model related to species scores in PCA in vegan:

library(vegan)
data(varespec)
pc <- rda(varespec)
biplot(pc) # species scores as biplot arrows
plot(envfit(pc ~ Pleuschr + Cladarbu + Cladrang + Cladstel, varespec))
ordisurf(pc ~ Cladstel, varespec, knots = 1, add = TRUE)

The envfit function adds arrows that point to the same direction as species scores, and ordisurf adds linear (knots = 1) trend surface to Cladstel. The isoclines of the linear trend surface are equally spaced and perpendicular to the arrow. Projecting sampling units to the arrow gives the predicted species abundance in this two-dimensional solution. The interpretation of species scores is similar in RDA, but there you must remember to use linear combination scores (display="lc"), and in CCA you must remember to use weighted regression (envfit and ordisurf take care of that automatically, but with lm or other non-vegan tools you need explicit weights).

This approach is not easily changed to use rank correlations. For ranks, you need to project points (sampling units) to a univariate sequence. Often people project on axes (that is, they correlate axes with species). However, better correlation will be found if we find a line through origin that gives the best rank correlation when sampling units are projected onto it for ranks (if a unique line, or a sector containing lines, exists). This would be similar to our approach of finding the direction of steepest change in linear trend surface. This is easily done with Euclidean space, like all vegan ordination spaces are, but not with ranks of projections.

The assumption of linear trend surface is pretty simplistic. It is the model for species in PCA and RDA, and it is the model for constraints in RDA and there it shows how the analysis sees your data (remember "lc" scores!). However, with other variables and with other ordination methods, more complicated response models are often more adequate. These can be fit[ted] using ordisurf with knots > 1.

Then about lengths of the arrows, or distances of species scores from the origin. The envfit() function finds the correct direction, but it scales the arrow length by correlation coefficient. In PCA and RDA, we have several alternative scaling options: see the long description of scaling and correlation in ?scores.cca. The default (correlation = FALSE) scaling makes arrows proportional to absolute change in species abundance. That is, abundant species can change a lot and can have long arrows, but scarce species can change only a bit and have always short arrows. It is absolute change, not relative change. With correlation = TRUE, the arrow lengths are proportional to relative change and will be similar to scaling by correlations used in envfit. Again, study the manual for details (?scores.cca).

score 2 · Answer 2 · answered Oct 07 '17 at 11:52

Please see below a very useful reply from Marti Anderson in response to an email relating to this question above - Copied with her permission:

Dear Philip,

Thanks for your message and interest in all of these methods and their interpretation. I won't comment on the R-based stuff as implemented in vegan, as no doubt Jari is best able to do that, but I would be happy to comment on the raw rank (or Perason) correlation vectors implemented in PRIMER/PERMANOVA software. First, let me emphasise that these are intended to be an exploratory tool. Quite simply, all they do is show the raw (or multiple) correlations of individual species (or other variables) with the ordination axes. Two important caveats to their use or important to note here. First, they most emphatically do not tell you anything directly about the specific role that individual species may have played in driving patterns seen in the dbRDA (or nMDS or PCO or whatever ordination axes you are talking about). They cannot do that because they are being drawn after the fact, and, as we know, the relationships between the original species variables and the ordination axes, in very many cases, are not expected to be linear (or even monotonic) at all. The ordination is being (generally) done in the space of some chosen dissimilarity measure (such as BC or Jaccard, etc.). This is appropriate for lots of reasons (which I am sure you already know and I won't go in to). To see what the patterns are of individual species in a constrained ordination space like dbRDA (or any other ordination), I suggest you use bubble plots, which re a much more refined tool for visualising the patterns. (And incidentally, in PRIMER 7, you can superimpose segmented bubble plots to visualise patterns for a set of species simultaneously in this way. The advantage of bubble plots is that they are able, for any given species, to show any pattern that might be occurring, including step-changes across groups, potential interactions across factors, unimodal or multi-modal patterns along gradients, etc. Now, the use of rank (or Pearson) correlation vectors have the advantage of being able to plot lots of species at once, but they are clearly a blunt tool by comparison, as they can only show well those species that may have increasing or decreasing relationships with the ordination axes, which if often not the whole story. They have their uses, however, particularly in the CAP setting, where the axes have been drawn specifically to maximise group differences. In this case, the groups are separated on the plot, so the species having longer axes drawn as increasing or decreasing across such plots will generally correspond to those species that characterise the group differences. However, even in the CAP example, sometimes the group differences are caused by changes in composition in whole sets of (minor) species acting together multivariately and simultaneously through the dissimilarity measure, and this might not be easily picked up by singular patterns for any of those species when considered independently and individually.

Well, I do hope that I have helped to clarify the issues here a little. The upshot of all of the above is that it is no problem to put these vectors onto any ordination plot you may choose, but they are a heuristic and posterior exercise, rather than a definitive or diagnostic tool in the dissimilarity-based kind of setting, and they only show a certain kind of relationship. Other tools (such as bubble or segmented bubble plots) will be richer in their information content regarding individual species patterns.

I hope the above is helpful! Kind regards, Marti Anderson

What is the difference between R vegan species-scores, and Primer Spearman rank correlation of species to axis?

2 Answers2