R Get all subsets where rowsum fulfills condition

Question

I have a data.table in R for example:

x <- data.table( id = c(1:10), count=c(10,110,20,30,5,40,50,15,20,70))

I want to select all subsets respectively combinations of id where the rowsum of count is between 90 and 110. One combination would be 105

id IN (1,3,4,5,6)

because the sum of count would be

x[id %in% c(1,3,4,5,6), sum(count)]

How do I get all possible combinations?

What's the scale of the real problem? A brute force method might work here, but it's unlikely to work with big data. In the problem you've posed above, there are 1023 possible combinations, but with 20 ID values there are over a million. — DaveArmstrong, Mar 11 '21 at 20:21
You may have asked a question that is more mathematical than data oriented here. My suggestion would be to check out the 'rfast' library on CRAN . Perhaps this function: "Row - Wise matrix/vector count the frequency of a value". There may be an optimized data.table solution with enough hand crafting, but usually questions like this relate to some well known complex mathematical operations that often specific libraries address. — rferrisx, Mar 12 '21 at 18:37

score 1 · Answer 1 · answered Mar 11 '21 at 21:02

Here is a brute-force solution for you current dataset

p <- crossprod(
  x$count,
  sapply(
    seq(2^nrow(x)),
    function(n) head(as.integer(intToBits(n)), nrow(x))
  )
)

res <- lapply(
  which(p >= 90 & p <= 110, arr.ind = TRUE)[, "col"],
  function(i) x$id[which(head(intToBits(i), nrow(x)) > 0, arr.ind = TRUE)]
)

and you will see

> head(res)
[[1]]
[1] 2

[[2]]
[1] 3 4 6

[[3]]
[1] 1 3 4 6

[[4]]
[1] 3 4 5 6

[[5]]
[1] 1 3 4 5 6

[[6]]
[1] 1 4 7

However, it doesn't scales up for large dataset with much more ids.

R Get all subsets where rowsum fulfills condition

1 Answers1