0

I have a data.table in R for example:

x <- data.table( id = c(1:10), count=c(10,110,20,30,5,40,50,15,20,70))

I want to select all subsets respectively combinations of id where the rowsum of count is between 90 and 110. One combination would be 105

id IN (1,3,4,5,6) 

because the sum of count would be

x[id %in% c(1,3,4,5,6), sum(count)]

How do I get all possible combinations?

Judy
  • 35
  • 5
  • What's the scale of the real problem? A brute force method might work here, but it's unlikely to work with big data. In the problem you've posed above, there are 1023 possible combinations, but with 20 ID values there are over a million. – DaveArmstrong Mar 11 '21 at 20:21
  • The real problem is about 1500 ids. – Judy Mar 11 '21 at 20:26
  • You may have asked a question that is more mathematical than data oriented here. My suggestion would be to check out the 'rfast' library on CRAN . Perhaps this function: "Row - Wise matrix/vector count the frequency of a value". There may be an optimized data.table solution with enough hand crafting, but usually questions like this relate to some well known complex mathematical operations that often specific libraries address. – rferrisx Mar 12 '21 at 18:37

1 Answers1

1

Here is a brute-force solution for you current dataset

p <- crossprod(
  x$count,
  sapply(
    seq(2^nrow(x)),
    function(n) head(as.integer(intToBits(n)), nrow(x))
  )
)

res <- lapply(
  which(p >= 90 & p <= 110, arr.ind = TRUE)[, "col"],
  function(i) x$id[which(head(intToBits(i), nrow(x)) > 0, arr.ind = TRUE)]
)

and you will see

> head(res)
[[1]]
[1] 2

[[2]]
[1] 3 4 6

[[3]]
[1] 1 3 4 6

[[4]]
[1] 3 4 5 6

[[5]]
[1] 1 3 4 5 6

[[6]]
[1] 1 4 7

However, it doesn't scales up for large dataset with much more ids.

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81