0

I am trying to perform a simple RDA using the vegan package to test the effects of depth, basin and sector on genetic population structure using the following data frame.

datafile.

The "ALL" variable is the genetic population assignment (structure).

In case the link to my data doesn't work well, I'll paste a snippet of my data frame here.

enter image description here

I read in the data this way:

RDAmorph_Oct6 <- read.csv("RDAmorph_Oct6.csv")

My problems are two-fold: 1) I can't seem to get my genetic variable to read correctly. I have tried three things to fix this.

gen=rda(ALL ~ Depth + Basin + Sector, data=RDAmorph_Oct6, na.action="na.exclude")
Error in eval(specdata, environment(formula), enclos = globalenv()) : 
  object 'ALL' not found
In addition: There were 12 warnings (use warnings() to see them)

so, I tried things like:

> gen=rda("ALL ~ Depth + Basin + Sector", data=RDAmorph_Oct6, na.action="na.exclude")
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

so I specified numeric

> RDAmorph_Oct6$ALL = as.numeric(RDAmorph_Oct6$ALL)
> gen=rda("ALL ~ Depth + Basin + Sector", data=RDAmorph_Oct6, na.action="na.exclude")
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

I am really baffled. I've also tried specifying each variable with dataset$variable, but this doesn't work either.

The strange thing is, I can get an rda to work if I look the effects of the environmental variables on a different, composite, variable

MC = RDAmorph_Oct6[,5:6]
H_morph_var=rda(MC ~ Depth + Basin + Sector, data=RDAmorph_Oct6, na.action="na.exclude")

Note that I did try to just extract the ALL column for the genetic rda above. This didn't work either. Regardless, this leads to my second problem.

When I try to plot the rda I get a super weird plot. Note the five dots in three places. I have no idea where these come from.

enter image description here

I will have to graph the genetic rda, and I figure I'll come up with the same issue, so I thought I'd ask now.

I've been though several tutorials and tried many iterations of each issue. What I have provided here is I think the best summary. If anyone can give me some clues, I would much appreciate it.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Ella Bowles
  • 101
  • 10
  • It’s not clear what `ALL` is, but can you go back an edit your post as you seem to have not got the plots and data marked up properly. `ALL` would typically be a data frame containing at least 1 column, but I infer that it is a column *in* your data object. – Gavin Simpson Nov 17 '19 at 23:47
  • Hmm, I just edited it and re-did all the links. Strangely, they still don't seem to be working. @ReinstateMonica-G.Simpson, can you suggest another way for me to upload/post them? – Ella Bowles Nov 18 '19 at 14:56
  • Your use of backticks is confusing the parser I think. At least have a newline before the back ticks – Gavin Simpson Nov 18 '19 at 15:38
  • I fixed some of it but at least one of the figures is still broken – Gavin Simpson Nov 18 '19 at 15:40
  • Ah, I see. The links are working now @ReinstateMonica-G.Simpson. Too bad I can't just paste the picture in (it isn't working to do that), but the link works to see it. – Ella Bowles Nov 18 '19 at 15:52

3 Answers3

0

The documentation, ?rda, says that the left-hand side of the formula specifying your model needs to be a data matrix. You can't pass it the name of a variable in the data object as the left-hand side (or at least if this was ever anticipated, doing so exposes bugs in how we parse the formula which is what leads to further errors).

What you want is a data frame containing a variable ALL for the left-hand side of the formula.

This works:

library('vegan')
df <- read.csv('~/Downloads/RDAmorph_Oct6.csv')

ALL <- df[, 'ALL', drop = FALSE]

Notice the drop = FALSE, which stops R from dropping the empty dimension (i.e. converting the single column data frame to a vector.

Then your original call works:

ord <- rda(ALL ~ Basin + Depth + Sector, data = df, na.action = 'na.exclude')
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
0

The problem is that rda expects a separate df for the first part of the formula (ALL in your code), and does not use the one in the data = argument.

As mentioned above, you can create a new df with the variable needed for analysis, but here's a oneline solution that should also work:

gen <- rda(RDAmorph_Oct6$ALL ~ Depth + Basin + Sector, data = RDAmorph_Oct6, na.action = na.exclude)

MMarton
  • 26
  • 3
0

This is partly similar to Gavin simpson's answer. There is also a problem with the categorical vectors in your data frame. You can either use library(data.table) and the rowid function to set the categorical variables to unique integers. Most preferably, not use them. I also wanted to set the ID vector as site names, but I am too lazy now.

library(data.table)
RDAmorph_Oct6 <- read.csv("C:/........../RDAmorph_Oct6.csv")

#remove NAs before. I like looking at my dataframes before I analyze them.
RDAmorph_Oct6 <- na.omit(RDAmorph_Oct6)

#I removed one duplicate
RDAmorph_Oct6 <- RDAmorph_Oct6[!duplicated(RDAmorph_Oct6$ID),]

#Create vector with only ALL
ALL  <- RDAmorph_Oct6$ALL

#Create data frame with only numeric vectors and remove ALL
dfn  <- RDAmorph_Oct6[,-c(1,4,11,12)]

#Select all categorical vectors.
dfc  <- RDAmorph_Oct6[,c(1,11,12)]

#Give the categorical vectors unique integers doesn't do this for ID (Why?).
dfc2 <- as.data.frame(apply(dfc, 2, function(x) rowid(x)))

#Bind back with numeric data frame
dfnc <- cbind.data.frame(dfn, dfc2)

#Select only what you need
df   <- dfnc[c("Depth", "Basin", "Sector")]

#The rest you know
rda.out <- rda(ALL ~ ., data=df, scale=T)
plot(rda.out, scaling = 2, xlim=c(-3,2), ylim=c(-1,1))

#Also plot correlations
plot(cbind.data.frame(ALL, df))

Sector and depth have the highest variation. Almost logical, since there are only three vectors used. The assignment of integers to the categorical vector has probably no meaning at all. The function assigns from top to bottom unique integers to the following unique character string. I am also not really sure which question you want to answer. Based on this you can organize the data frame.