2

I am trying to cast a dataframe into an other one, see below for the examples:

> start = data.frame(Aa = c('A','A','A','A','a','a','a','a'),Bb = c('B','B','b','b','B','B','b','b'),Cc = c('C','c','C','c','C','c','C','c'),v=c(1,2,3,4,5,6,7,8))
> start
  Aa Bb Cc v
1  A  B  C 1
2  A  B  c 2
3  A  b  C 3
4  A  b  c 4
5  a  B  C 5
6  a  B  c 6
7  a  b  C 7
8  a  b  c 8

And I would like to have a data frame like this one:

1  A  B  3
2  A  b  7
3  a  B  11
4  a  b  15
5  B  C  6
6  B  c  8
7  b  C  10
8  b  c  12
9  A  C  4
10  A  c  6
11  a  C  12
12  a  c  14

Where line 1 is calculated because we have A-B-C -> 1 and A-B-c -> 2 so A-B -> 3

The fact is that I can imagine a solution with some for loops on the columns, but I need it to time efficient, I can have 100,000 rows and up to 100 columns so I need something fast, and I don't think that the for loop are really efficient in R.

Do you have any ideas?

Thanks you!

Reyflex
  • 23
  • 2
  • Just to add some clarity, Reyflex wants `sum(v)` for each pairwise combination of `Aa`, `Bb`, and `Cc`. So for the pair `B, C`, find every row where `Bb == B` and `Cc == C` and calculate `sum(v)` – Mako212 Jan 03 '18 at 17:28

1 Answers1

3

Perhaps you can use combn on the column names.

Here, I've used data.table for its efficient aggregation and for the convenience of rbindlist to put the data back together.

library(data.table)
setDT(start)

rbindlist(combn(names(start)[1:3], 2, FUN = function(x) {
  start[, sum(v), x]
}, simplify = FALSE))
#     Aa Bb V1
#  1:  A  B  3
#  2:  A  b  7
#  3:  a  B 11
#  4:  a  b 15
#  5:  A  C  4
#  6:  A  c  6
#  7:  a  C 12
#  8:  a  c 14
#  9:  B  C  6
# 10:  B  c  8
# 11:  b  C 10
# 12:  b  c 12
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Perfect thanks a lot A5C1D2H2I1M1N2O1R2T1! Just a little things: after testing it it looks like it is 'fun' instead of 'FUN' on mine – Reyflex Jan 04 '18 at 09:11
  • @Reyflex, maybe you're using `combn` from the "combinat" package and not the one from base R's "utils"? That function seems to have a lower-case `fun` as opposed to the base version that has an all uppercase `FUN`.... – A5C1D2H2I1M1N2O1R2T1 Jan 04 '18 at 09:41