My problem: need to find all disjoint (non-overlapping) sets from a set of sets.
Background: I am using comparative phylogenetic methods to study trait evolution in birds. I have a tree with ~300 species. This tree can be divided into subclades (i.e. subtrees). If two subclades do not share species, they are independent. I'm looking for an algorithm (and an R implementation if possible) that will find all possible subclade partitions where each subclade has greater than 10 taxa and all are independent. Each subclade can be considered a set and when two subclades are independent (do not share species) these subclades are then disjoint sets.
Hope this is clear and someone can help.
Cheers, Glenn
The following code produces an example dataset. Where subclades is a list of all possible subclades (sets) from which I'd like to sample X disjoint sets, where the length of the set is Y.
###################################
# Example Dataset
###################################
library(ape)
library(phangorn)
library(TreeSim)
library(phytools)
##simulate a tree
n.taxa <- 300
tree <- sim.bd.taxa(n.taxa,1,lambda=.5,mu=0)[[1]][[1]]
tree$tip.label <- seq(n.taxa)
##extract all monophyletic subclades
get.all.subclades <- function(tree){
tmp <- vector("list")
nodes <- sort(unique(tree$edge[,1]))
i <- 282
for(i in 1:length(nodes)){
x <- Descendants(tree,nodes[i],type="tips")[[1]]
tmp[[i]] <- tree$tip.label[x]
}
tmp
}
tmp <- get.all.subclades(tree)
##set bounds on the maximum and mininum number of tips of the subclades to include
min.subclade.n.tip <- 10
max.subclade.n.tip <- 40
##function to replace trees of tip length exceeding max and min with NA
replace.trees <- function(x, min, max){
if(length(x) >= min & length(x)<= max) x else NA
}
#apply testNtip across all the subclades
tmp2 <- lapply(tmp, replace.trees, min = min.subclade.n.tip, max = max.subclade.n.tip)
##remove elements from list with NA,
##all remaining elements are subclades with number of tips between
##min.subclade.n.tip and max.subclade.n.tip
subclades <- tmp2[!is.na(tmp2)]
names(subclades) <- seq(length(subclades))