4

I am relatively new to phylogenetic regression models. In the past I used PGLS when I had only 1 entry for each species in my tree. Now I have a dataset with thousands of records for a total of 9 species and I would like to run a phylogenetic model. I read the tutorial of the most common packages (e.g. caper) but I am unsure how to build the model.

When I try to create the object for caper, i.e. using:

obj <- comparative.data(phy = Tree, data = Data, names.col = species, vcv = TRUE, na.omit = FALSE, warn.dropped = TRUE)

I get the message:

Error in row.names<-.data.frame(*tmp*, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘Species1’, ‘Species2’, ‘Species3’, ‘Species4’, ‘Species5’, ‘Species6’, ‘Species7’, ‘Species8’, ‘Species9’

I understood that I may solve this by applying a MCMCglmm model but I am unfamiliar with Bayesian models.

Thanks in advance for your help.

1 Answers1

5

This is indeed not going to work with a simple PGLS from caper because it cannot deal with individuals as a random effect. I suggest you use MCMCglmm that is not much more complex to understand and will allow you to have individuals as a random effect. You can find excellent documentation from the package's author here or here or an alternative documentation that's more dealing with some specific aspects of the package (namely tree uncertainty) here.

Really briefly to get you going:

## Your comparative data
comp_data <- comparative.data(phy = my_tree, data =my_data,
      names.col = species, vcv = TRUE)

Note that you can have a specimen column that can look like this:

   taxa        var1 var2 specimen
1     A  0.08730689    a    spec1
2     B  0.47092692    a    spec1
3     C -0.26302706    b    spec1
4     D  0.95807782    b    spec1
5     E  2.71590217    b    spec1
6     A -0.40752058    a    spec2
7     B -1.37192856    a    spec2
8     C  0.30634567    b    spec2
9     D -0.49828379    b    spec2
10    E  1.42722363    b    spec2

You can then set up your formula (similar to a simple lm formula):

## Your formula
my_formula <- variable1 ~ variable2

And your MCMC settings:

## Setting the prior list (see the MCMCglmm course notes for details)
prior <- list(R = list(V=1, nu=0.002),
              G = list(G1 = list(V=1, nu=0.002)))

## Setting the MCMC parameters
## Number of interactions
nitt <- 12000

## Length of burnin
burnin <- 2000

## Amount of thinning
thin <- 5

And you should then be able to run a default MCMCglmm:

## Extracting the comparative data
mcmc_data <- comp_data$data

## As MCMCglmm requires a column named animal for it to identify it as a phylo
## model we include an extra column with the species names in it.
mcmc_data <- cbind(animal = rownames(mcmc_data), mcmc_data)
mcmc_tree <- comp_data$phy

## The MCMCglmmm
mod_mcmc <- MCMCglmm(fixed = my_formula, 
                     random = ~ animal + specimen, 
                     family = "gaussian",
                     pedigree = mcmc_tree, 
                     data = mcmc_data,
                     nitt = nitt,
                     burnin = burnin,
                     thin = thin,
                     prior = prior)
user2352714
  • 314
  • 1
  • 15
Thomas Guillerme
  • 1,747
  • 4
  • 16
  • 23
  • Thanks a lot. Will give it a go and let you know whether I can handle it or not. – Marco Gamba Jul 04 '18 at 20:34
  • What is the gain of specifying individuals as a random factor? My understanding is that all it does is accounting for overdispersion (as we would do in a generalised mixed effect model, as suggested by Zuur et al. in one of their books). Am I correct? Thank you in advance! – Marco Plebani Nov 14 '18 at 06:44
  • 1
    Is there a function (perhaps in `MCMCglmm`) that allows to "assemble" the comparative data when multiple observations per species are present? In the code suggested by @thomas-guillerme `comp_data` is still created using `comparative.data` from `caper`, which gives the error described by @marco-gamba . Cheers – Marco Plebani Nov 14 '18 at 09:13
  • You could use the function `as.mulTree` from the [`mulTree`](https://github.com/TGuillerme/mulTree) package which creates objects that deal with multiple specimens and can be passed to `MCMCglmm` via `mulTree`. See the `mulTree` documentation example for more details. – Thomas Guillerme Nov 15 '18 at 00:01