1

My question is quite similar to this one: Find a subset from a set of integer whose sum is closest to a value

It discussed the algorithm only, but I want to solve it with R. I'm quite new to R and tried to work out a solution, but I wonder whether there is a more efficient way.

Here is my example:

# Define a vector, to findout a subset whose sum is closest to the reference number 20. 
A <- c(2,5,6,3,7)

# display all the possible combinations
y1 <- combn(A,1)
y2 <- combn(A,2)
y3 <- combn(A,3)
y4 <- combn(A,4)
y5 <- combn(A,5)
Y <- list(y1,y2,y3,y4,y5)

# calculate the distance to the reference number of each combination
s1 <- abs(apply(y1,2,sum)-20)
s2 <- abs(apply(y2,2,sum)-20)
s3 <- abs(apply(y3,2,sum)-20)
s4 <- abs(apply(y4,2,sum)-20)
s5 <- abs(apply(y5,2,sum)-20)
S <- list(s1,s2,s3,s4,s5)

# find the minimum difference
M <- sapply(S,FUN=function(x) list(which.min(x),min(x)))
Mm <- which.min(as.numeric(M[2,]))

# return the right combination
data.frame(Y[Mm])[as.numeric(M[,Mm[1]])]

so the answer is 2,5,6,7.

How can I refine this program? Especially the five combn()s and five apply()s, is there a way that can work them at once? I hope when A has more items in it, I can use length(A) to cover it.

d.b
  • 32,245
  • 6
  • 36
  • 77
Tyelcie
  • 48
  • 5
  • Try with `lapply(1:5, function(i) abs(colSums(combn(A, i))-20))` – akrun Jun 17 '17 at 15:31
  • 1
    I think the first 2 code sets can be replaced with `Y <- lapply(1:5, function(i) combn(A, i)); S <- lapply(Y, function(x) abs(colSums(x) - 20))` and then apply your code – akrun Jun 17 '17 at 15:41
  • How big will your real `A` be? For a large sized vector, your code will not finish in reasonable time since you are testing all combinations one by one. If the length is 5 like in this example, there are only 32 combinations to check (32=2^5). If the size is 20, then 1048576 combinations, which will end in a few minutes. For 50, it is almost hopeless. If you are working with a large size `A`, then you will need to find a clever algorithm. – Kota Mori Jun 17 '17 at 18:02

2 Answers2

0

Here is another way to do it,

l1 <- sapply(seq_along(A), function(i) combn(A, i))
l2 <- sapply(l1, function(i) abs(colSums(i) - 20))

Filter(length, Map(function(x, y)x[,y], l1, sapply(l2, function(i) i == Reduce(min, l2))))
#[[1]]
#[1] 2 5 6 7

The last line uses Map to index l1 based on a logical list created by finding the minimum value from list l2.

Sotos
  • 51,121
  • 6
  • 32
  • 66
0

combiter library has isubsetv iterator, which goes through all subset of a vector. Combined with foreach simplifies the code.

library(combiter)
library(foreach)
A <- c(2,5,6,3,7)

res <- foreach(x = isubsetv(A), .combine = c) %do% sum(x)
absdif <- abs(res-20)
ind <- which(absdif==min(absdif))
as.list(isubsetv(A))[ind]
Kota Mori
  • 6,510
  • 1
  • 21
  • 25
  • Thank you for introducing a new grammar to me! Dose "%do%" means multithread running? I didn't know it. And you mentioned an alternative algorithm. My real A won't be larger than 20 items, but I'm still curious. Can you give me some implications? – Tyelcie Jun 18 '17 at 07:19