I've created a custom function that takes 4 inputs, runs some filtering, then performs a permanova using vegan::adonis2()
The required inputs are:
sample_data: a tibble
primary_vars: name of a column in sample_data
dists: distance object
covars: a character vector corresponding to a set of column names in sample_data
And the function is as follows:
run_adonis <- function(sample_data, primary_vars, dists, covars) {
#remove outliers, only important for my own data
var_name <- substitute(primary_vars)
covars_name <- covars
outliers <- boxplot.stats(sample_data[[var_name]])$out
filtered_data <- sample_data[!(sample_data[[var_name]] %in% outliers), ]
filtered_dist <- dists %>% as.matrix %>% .[filtered_data$sample_name, filtered_data$sample_name] %>% as.dist
#run adonis2
adonis_formula <- paste("filtered_dist ~", paste(c(covars_name, var_name), collapse = " + ")) %>% as.formula()
result <- adonis2(adonis_formula, by = "margin", data = filtered_data, na.action = na.exclude, parallel = 2)
return(result)
}
Running a repex:
library(vegan)
library(tidyverse)
library(furrr)
varechem <- get(data(varechem)) %>% as_tibble(rownames = "sample_name")
dist_bray <- get(data(varespec)) %>% vegdist(., method = "bray")
#Select the set of "covariables" I want to adjust for
covariables <- c("K", "P", "N", "Mn")
So my function works exactly as I want it:
set.seed(2023)
run_adonis(varechem, S, dist_bray, covariables)
#which is the equivalent of:
set.seed(2023)
vegan::adonis2(dist~K+P+N+Mn+S, data=varechem, by="margin", na.action=na.exclude, parallel = 2)
So far, so good!
Now this is where I get stuck. I want to use furrr to run this function several times using a combination of primary_vars
and dists
. The covars and sample_data stay the same.
For example, say I have 2 more distances I'm interested in:
dist_rpca <- get(data(varespec)) %>% vegdist(., method = "robust.aitchison")
dist_hell <- get(data(varespec)) %>% vegdist(., method = "hellinger")
#store the dist names
dist_names <- c("dist_bray", "dist_rpca", "dist_hell")
#and a few other primary variables of interest:
primaires <- c("S", "pH", "Mo")
What I want is to end up with a list which stores the output of multiple adonis2() tests. The number of models will be equal to length(primaries)*length(dist_names). So in this case 9. In other words, always using 1 distance matrix ~ 1 of my "primaries" + all of "covariables".
So far my approach has been to create the combinations of dists x primaries I want to run in my model:
dists_primary_combinations <- expand.grid(dist_names = dist_names, primaries = primaries)
And I was hoping I could feed this into future_map, but I'm quite off the mark:
adonis_mods <- future_map(dists_inf_var_combinations,
~run_adonis(varechem, primary_vars = .x$primaires, dists = .x$dist_names, covars = covariables))
I totally get why this fails, and I fear the solution will require a change in the original function but just posting this to show where my thought process was. I can get what using a nested loop but I'm really hoping to get this working as a function + furrr so I can scale it to much bigger datasets and run in parallel.
Any help would be much appreciated!