0

I've created a custom function that takes 4 inputs, runs some filtering, then performs a permanova using vegan::adonis2()

The required inputs are:
sample_data: a tibble
primary_vars: name of a column in sample_data
dists: distance object
covars: a character vector corresponding to a set of column names in sample_data

And the function is as follows:

run_adonis <- function(sample_data, primary_vars, dists, covars) {
  #remove outliers, only important for my own data
  var_name <- substitute(primary_vars)
  covars_name <- covars
  outliers <- boxplot.stats(sample_data[[var_name]])$out
  filtered_data <- sample_data[!(sample_data[[var_name]] %in% outliers), ]
  filtered_dist <- dists %>% as.matrix %>% .[filtered_data$sample_name, filtered_data$sample_name] %>% as.dist
  
  #run adonis2
  adonis_formula <- paste("filtered_dist ~", paste(c(covars_name, var_name), collapse = " + ")) %>% as.formula()
  result <- adonis2(adonis_formula, by = "margin", data = filtered_data, na.action = na.exclude, parallel = 2)
  return(result)
}
    

Running a repex:

library(vegan)
library(tidyverse)
library(furrr)
varechem <- get(data(varechem)) %>% as_tibble(rownames = "sample_name")
dist_bray <- get(data(varespec)) %>% vegdist(.,  method = "bray")

#Select the set of "covariables" I want to adjust for
covariables <- c("K", "P", "N", "Mn")

So my function works exactly as I want it:

set.seed(2023)
run_adonis(varechem, S, dist_bray, covariables)

#which is the equivalent of:

set.seed(2023)
vegan::adonis2(dist~K+P+N+Mn+S, data=varechem, by="margin", na.action=na.exclude, parallel = 2)

So far, so good!

Now this is where I get stuck. I want to use furrr to run this function several times using a combination of primary_vars and dists. The covars and sample_data stay the same.

For example, say I have 2 more distances I'm interested in:

dist_rpca <- get(data(varespec)) %>% vegdist(.,  method = "robust.aitchison")
dist_hell <- get(data(varespec)) %>% vegdist(.,  method = "hellinger")

#store the dist names
dist_names <- c("dist_bray", "dist_rpca", "dist_hell")

#and a few other primary variables of interest:
primaires <- c("S", "pH", "Mo")

What I want is to end up with a list which stores the output of multiple adonis2() tests. The number of models will be equal to length(primaries)*length(dist_names). So in this case 9. In other words, always using 1 distance matrix ~ 1 of my "primaries" + all of "covariables".

So far my approach has been to create the combinations of dists x primaries I want to run in my model:

dists_primary_combinations <- expand.grid(dist_names = dist_names, primaries = primaries)

And I was hoping I could feed this into future_map, but I'm quite off the mark:

adonis_mods <- future_map(dists_inf_var_combinations, 
                                 ~run_adonis(varechem, primary_vars = .x$primaires, dists = .x$dist_names, covars = covariables))

I totally get why this fails, and I fear the solution will require a change in the original function but just posting this to show where my thought process was. I can get what using a nested loop but I'm really hoping to get this working as a function + furrr so I can scale it to much bigger datasets and run in parallel.

Any help would be much appreciated!

mestaki
  • 49
  • 4

0 Answers0