4

I want to remove part of the list where it is a complete set of the other part of the list. For example, B intersect A and E intersect C, therefore B and E should be removed.

MyList <- list(A=c(1,2,3,4,5), B=c(3,4,5), C=c(6,7,8,9), E=c(7,8))
MyList
$A
[1] 1 2 3 4 5
$B
[1] 3 4 5
$C
[1] 6 7 8 9
$E
[1] 7 8

MyListUnique <- RemoveSubElements(MyList)
MyListUnique
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9

Any ideas ? Any know function to do it ?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Marcelo
  • 51
  • 9
  • If efficiency isn't an issue, maybe `idx <- subset(expand.grid(seq_along(MyList), seq_along(MyList)), Var1!=Var2) rem <- unique(names(which(lengths(mapply(setdiff, MyList[idx[,1]], MyList[idx[,2]]))==0))) MyList[!names(MyList) %in% rem]`. – lukeA Mar 06 '17 at 12:31
  • It might be more convenient to start off with `tmp = crossprod(table(stack(MyList)))` or a sparse alternative. For example, in this case, something like `tmp & (diag(tmp)[col(tmp)] - tmp)` seems to indicate correctly which (rows) are part of which (columns) (i.e. `rownames(which(tmp & (diag(tmp)[col(tmp)] - tmp), TRUE))` seems to work here). Could you provide a bit more context/cases on the problem? – alexis_laz Mar 06 '17 at 12:46

2 Answers2

1

As long as your data is not too huge, you can use an approach like the following:

# preparation
MyList <- MyList[order(lengths(MyList))]
idx <- vector("list", length(MyList))
# loop through list and compare with other (longer) list elements
for(i in seq_along(MyList)) {
  idx[[i]] <- any(sapply(MyList[-seq_len(i)], function(x) all(MyList[[i]] %in% x)))
}
# subset the list
MyList[!unlist(idx)]        
#$C
#[1] 6 7 8 9
#
#$A
#[1] 1 2 3 4 5
talat
  • 68,970
  • 21
  • 126
  • 157
1

Similar to the other answer, but hopefully clearer, using a helper function and 2 sapplys.

#helper function to determine a proper subset - shortcuts to avoid setdiff calculation if they are equal
is.proper.subset <- function(x,y) !setequal(x,y) && length(setdiff(x,y))==0

#double loop over the list to find elements which are proper subsets of other elements
idx <- sapply(MyList, function(x) any(sapply(MyList, function(y) is.proper.subset(x,y))))

#filter out those that are proper subsets
MyList[!idx]
$A
[1] 1 2 3 4 5

$C
[1] 6 7 8 9
James
  • 65,548
  • 14
  • 155
  • 193