3

Sorry in advanced for the bad title, but I really didn't know how to word it succinctly.

I have a dataframe I'm playing around with where an item can be in any of 4 categories, not limited to 1. Here's an example of the dummy matrix I'm working with:

ID <- 1:7
A <- c(1,0,0,1,1,0,0)
B <- c(0,1,0,0,1,0,1)
C <- c(0,0,0,0,0,1,1)
D <- c(1,0,1,1,0,0,0)
A_B <- (A+B > 0)*1
C_D <- (C+D > 0)*1
Cost <- c(25, 52, 11, 75, 45, 5, 34)

df <- data.frame(ID, A, B, C, D, A_B, C_D, A_B_C_D = 1, Cost)
df

ID A B C D A_B C_D A_B_C_D Cost
1  1 0 0 1  1   1     1     25
2  0 1 0 0  1   0     1     52
3  0 0 0 1  0   1     1     11
4  1 0 0 1  1   1     1     75
5  1 1 0 0  1   0     1     45
6  0 0 1 0  0   1     1     5
7  0 1 1 0  1   1     1     34

I need for this data frame to be organized such that row 1 contains an A, row 2 a B, row 3 a C, row 4 a D, row 5 an A or B, row 6 a C or D, and row 7 whatever is left over. I can't use arrange since starting with desc(A) would automatically give 1, 4, 5. An acceptable solution to this problem would be:

Order <- c(4, 2, 7, 1, 5, 3, 6)
df[Order,]
df

ID A B C D A_B C_D A_B_C_D Cost
4  1 0 0 1   1   1       1   75
2  0 1 0 0   1   0       1   52
7  0 1 1 0   1   1       1   34
1  1 0 0 1   1   1       1   25
5  1 1 0 0   1   0       1   45
3  0 0 0 1   0   1       1   11
6  0 0 1 0   0   1       1    5

Essentially, the diagonal needs to be 7 straight ones, but I can't think of how to program it to sort correctly no matter the data set. I feel like this should be really easy but I'm just not seeing it. Would transposing make it easier?

Thanks in advance.

CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21
  • What will be the logic for more number of rows? – akrun Dec 29 '17 at 06:22
  • @akrun there will only ever be 7 rows, just multiple iterations: they are the result of an `lpsolve` optimization. The optimization works such that I'll always be able to make the "right" order, I just need to streamline the ordering. Correct order is needed for the final output to work. Thanks – CoolGuyHasChillDay Dec 29 '17 at 06:59
  • Is there only ever one solution or can their be multiple ones? If the latter, do you care which one you get? You may have to implement a search algorithm that walks the tree of possible arrangements until it finds one that works. – Claus Wilke Dec 29 '17 at 07:11
  • @ClausWilke There can be multiple solutions to the ordering, and it doesn't matter the order so long as there's a 1 across the diagonal. Another algorithm was sort of what I was afraid of haha. Was hoping for someone smarter than I am to have an easy solution, but sounds like I'll get to work on an algorithm. Thanks for the help – CoolGuyHasChillDay Dec 29 '17 at 07:21

2 Answers2

2

One approach would be to use brute force, by getting all the permutations of row arrangements and checking which satisfy the diagonal expectation:

z <- apply(permute::allPerms(1:7), 1, function(x){
  mat <- as.matrix(df[,2:8])
  if(all(diag(mat[x,]) == rep(1,7))){
    return(df[x,])
  }
  })

then you can just remove the NULL values:

z <- Filter(Negate(is.null), z)

and get all the 88 solutions

length(z) #88

z[[5]] #random solution
#output

  ID A B C D A_B C_D A_B_C_D Cost
1  1 1 0 0 1   1   1       1   25
2  2 0 1 0 0   1   0       1   52
6  6 0 0 1 0   0   1       1    5
4  4 1 0 0 1   1   1       1   75
5  5 1 1 0 0   1   0       1   45
3  3 0 0 0 1   0   1       1   11
7  7 0 1 1 0   1   1       1   34

To just get the first matching permutation one can use a while loop:

perms <- permute::allPerms(1:7)
mat <- as.matrix(df[,2:8])
i <- 1
while (!all(diag(mat[perms[i,],])  == rep(1,7))) {
  i = i+1
}

df[perms[i,],]

#  ID A B C D A_B C_D A_B_C_D Cost
1  1 1 0 0 1   1   1       1   25
2  2 0 1 0 0   1   0       1   52
6  6 0 0 1 0   0   1       1    5
3  3 0 0 0 1   0   1       1   11
4  4 1 0 0 1   1   1       1   75
7  7 0 1 1 0   1   1       1   34
5  5 1 1 0 0   1   0       1   45

lets check the speed:

test <- function(df){
  z <- apply(permute::allPerms(1:7), 1, function(x){
    mat <- as.matrix(df[,2:8])
    if(all(diag(mat[x,]) == rep(1,7))){
      return(df[x,])
    }
  })
  z <- Filter(Negate(is.null), z)
  return(z)
}

test2 <- function(df){
  perms <- permute::allPerms(1:7)
  mat <- as.matrix(df[,2:8])
  i <- 1
  while (!all(diag(mat[perms[i,],])  == rep(1,7))) {
    i = i+1
  }
  df[perms[i,],]
}
microbenchmark::microbenchmark(b <- test(df), 
                           c <- test2(df), times = 10L)

    Unit: milliseconds
           expr       min        lq      mean   median        uq       max neval cld
  b <- test(df) 392.68257 396.81450 412.41600 401.0613 408.15582 509.77693    10   b
 c <- test2(df)  46.11754  46.92276  47.80778  47.3977  48.82543  50.05795    10  a 

not all that bad

missuse
  • 19,056
  • 3
  • 25
  • 47
  • This is great, thanks for the help. Just so I'm understanding `test2` on a higher level: basically what you're telling R to do is to make as many row-combinations as possible (with the specific orderings stored in `perms`). Then you're telling R to reorganize the `df` based on the specifications in `perms`. Once all the diagonals are equal to 1, stop the loop and return `df` in that order. – CoolGuyHasChillDay Dec 29 '17 at 21:11
  • shoot, my actual `df`s have one more column so it's an 8:8 dummy matrix. `permute::allPerms` cannot handle 1:8 ("number of possible permutations is too large"). `allPerms(1:5)` has 119 rows, `allPerms(1:6)` has 719 rows, and `allPerms(1:7)` has 5039 rows, so I'd suspect `allPerms(1:8)` to be around 38k rows. A 38000x8 matrix is generally fine for R so I'm not sure why it's getting stuck. Do you know if there's a workaround? Looking in CRAN right now... @missuse – CoolGuyHasChillDay Dec 29 '17 at 21:24
  • sorry for spamming you, but wanted to let you know I figured it out before you went searching. Adding `ctrl <- permute::how(maxperm = 50000` then `permute::allPerms(1:8, control = ctrl)` did the trick. Thanks again! – CoolGuyHasChillDay Dec 29 '17 at 21:34
  • @CoolGuyHasChillDay The difference in test and test2 is that in the first function R is testing all the permutations while in the second it is testing until it finds the first match, for your purposes it will be better because of the speed since it looks like the testing part takes a lot more than then making the permutations. I am glad you solved the 8! problem, 9! should work too, but 10! will probably be out of the question. I am still thinking of other solutions, if I find something commending I will update the post. – missuse Dec 30 '17 at 06:50
0

From the data you have posted there is no unique solution possible, as rows 1 and 4 have identical A to D column sequences. Otherwise it would seem to have been a straightforward exercise in the use of four-bit boolean patterns. I don't understand why you have a repeat of bit pattern 1001, unless this is a mistake you made in setting up the example data.

To explain why I'm puzzled, if rows 1 and 4 are reversed in your suggested ordering it does not invalidate your requirement that the diagonal be all 1s, yet it is clearly not the same ordering as before:

Order2 <- c(1, 2, 7, 4, 5, 3, 6)
df[Order2,]


   ID A B C D A_B C_D A_B_C_D Cost
    1 1 0 0 1   1   1       1   25
    2 0 1 0 0   1   0       1   52
    7 0 1 1 0   1   1       1   34
    4 1 0 0 1   1   1       1   75
    5 1 1 0 0   1   0       1   45
    3 0 0 0 1   0   1       1   11
    6 0 0 1 0   0   1       1    5

A non-unique solution can be determined using AND and OR combinations if you don't care about the ordering as such - it's an exercise in the use of truth tables (or in the application of combinational logic such as the use of De Morgan's Theorem).

Stewart Ross
  • 1,034
  • 1
  • 8
  • 10
  • Thanks for helping me out! I said "an acceptable answer is:" but I should have been more clear that there isn't a unique solution, so long as the diagonal is filled with 1's then it works. That being said I'll take a look at De Morgan's theorem, thanks for the resource. – CoolGuyHasChillDay Dec 29 '17 at 17:08