0

I am somewhat new to R, so forgive my basic questions.

I perform a CCA on a full dataset (358 sites, 40 abiotic parameters, 100 species observation).

library(vegan)
env <- read.table("env.txt", header = TRUE, sep = "\t", dec = ",")
otu <- read.table(otu.txt", header = TRUE, sep = "\t", dec = ",")
cca <- cca(otu~., data=env)
cca.plot <- plot(cca, choices=c(1,2))
vif.cca(cca)
ccared <- cca(formula = otu ~EnvPar1,2,n, data = env)
ccared.plot <- plot(ccared, choices=c(1,2))
orditorp(ccared.plot, display="sites")

This works without using sample names in the first columns (initially, the first column containing numeric samples names got interpreted as a variable, so i used tables without that information. When i add site names to the plot via orditorp, it gives "row.name=n" in the plot.) I want to use my sample names, however. I tried row.names=1 on both tables with sample name information:

envnames <- read.table("envwithnames.txt", header = TRUE, row.names=1, sep = "\t", dec = ",")
otunames <- read.table("otuwithnames.txt", header = TRUE, row.names=1, sep = "\t", dec = ",")

, and any combination of env/otu/envnames/otunames. cca worked out well in any case, but any plot command yielded

plot.ccarownames <- plot(cca(ccarownames, choices=c(1,2)))
Error in rowSums(X) : 'x' must be numeric

My second problem is connected to that: The 358 sites are grouped into 6 groups (4x60,2x59). The complete matrix has this information inferred as an extra column. Since i couldnt work out the row name problem, i am even more stuck with nominal data, anyhow. The original matrix contains a first column (sample names, numeric, but can be easily transformed to nominal) and second one (group identity, nominal), followed by biological observations.

What i would like to have:

  1. A CCA containing all six groups that is coloring sites per group.
  2. A CCA containing only data for one group (without manual construction of individual input tables)
  3. CCA plots that are using my original sample names.

Any help is appreciated! Really, i am stuck with it since yesterday morning :/

nouse
  • 3,315
  • 2
  • 29
  • 56
  • What do you mean "if I use row names"? If you read them into your table? Or if you use them in your model? This is unclear. I'm also not sure what extra column you are talking about for the groups or how exactly the CCA "fails" in that case. Please take the time to create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Also when using functions outside of the default packages, make it clear which libraries you are loading. If we can get the same errors as you, we are more likely to be able to help. – MrFlick Jun 16 '14 at 15:10
  • As one of the **vegan** developers, I can pretty confidently say this is user error, somewhere, perhaps repeatedly, along the way. Vegan will use the row names if you set these appropriately on your data frame. If you leave these in the data object, I doubt even `cca()` would work as one of the first things we do is convert the data to matrices via `as.matrix()`, hence you'd get a character matrix if there was any non-numeric info in your data frame *at all*. As MrFlick says, a reproducible example is required to look into this further. – Gavin Simpson Jun 16 '14 at 15:31
  • Hi there, thanks for the quick answers. I should elobarate: First i used read.table(x, header=1, sep/dec), with x including a column with numeric samples names. This column was interpreted as a variable, and the cca got messed up. i tried to read.table(x, header=1, row.names=1, sep/dec, ), and i got the Error in `rowSums(X) : 'x' must be numeric` message. ill update my initial question. – nouse Jun 17 '14 at 08:35

1 Answers1

0

I'm using cca() from vegan myself and I have some of your own problems, however I've been able to at least solve your original "row names" problem. I'm doing a CCA analysis on data from 41 soils, with 334 species and 39 environmental factors. In my case I used

rownames(MyDataSet) <- MyDataSet$ObservationNamesColumn

(I used default names such as MyDataSet for the sake of example here) However I still had environmental factors which weren't numerical (such as soil texture). You could try checking for non numerical factors in case you have a mistake in your original dataset or an abiotic factor which is not interpreted as numerical for any other reason. To do this you can either use the command str(MyDataSet) which tells you the nature of each of your variable, or lapply(MyDataSet, class) which also tells you the same but in a different output.

In case you have abiotic factors which are not numerical (again, such as texture) and you want to remove them, you can do so by creating a whole new dataset using only the numerical variables (you will still keep your observation names as they were defined as row names), this is rather easy to do and can be done using something similar to this:

MyDataSet.num <- MyDataSet[,sapply(MyDataSet, is.numeric)]

This creates a new data set which has the same rows as the original but only columns (variables) with numeric values. You should be able then to continue your work using this new data set.

I am very new to both R programming and statistics (I'm a microbiologist) but I hope this helps!

Edu VO
  • 1
  • 1
  • 2