Data mutate in R

Question

I have a data frame like this:

Var1 Var2 value
x5   x1   2
x6   x1   6
x1   x2   2
x5   x2   2
x6   x2   4
x7   x2   3

And I'd like to reshape/aggregate/summarise it according to corresponding Var2 with minimum values of each pairs of Var1, just like that:

Var3 Var4 minvalue
x5   x6   2
x1   x5   2
x1   x6   2
x1   x7   2
x5   x6   2
x5   x7   2
x6   x7   3

If you combine Var1 values by corresponding Var2 values select minimum value. E.g x1 as Var2 is connects x5 and x6 as Var1 and value is min(x5,x6)=2

Any idea?

How are the two datasets related? This question needs clarifying as it is impossible to know what you are trying to achieve. — Michael Harper, Nov 01 '17 at 12:31
I think this is what you needed `do.call(rbind, lapply(split(df1[-2], df1$Var2), function(x) {x1 <- combn(x$Var1, 2, FUN = list); data.frame(do.call(rbind, x1),minvalue = sapply(x1, function(y) min(x$value[x$Var1 %in% y]))) }))` — akrun, Nov 01 '17 at 12:36
Or use `data.table` i.e. `setDT(df1)[, {x1 <- combn(Var1, 2, FUN = list); data.frame(do.call(rbind, x1), minvalue = sapply(x1, function(y) min(value[Var1 %in% y])), stringsAsFactors= FALSE)}, Var2]` — akrun, Nov 01 '17 at 12:42

score 1 · Accepted Answer · answered Nov 01 '17 at 13:11

Based on the description, we need to do a groupby combn to get the min 'value' of the 'pairs'. A base R option would be to split by the 'Var2' and then do the combn on 'Var1', get the min of 'value' by subsetting the corresponding values for each 'Var1'

res <- do.call(rbind, lapply(split(df1[-2], df1$Var2), function(x) {
       x1 <- combn(x$Var1, 2, FUN = list)
      data.frame(do.call(rbind, x1),minvalue = sapply(x1, function(y) 
                 min(x$value[x$Var1 %in% y]))) }))
row.names(res) <- NULL
res
#  X1 X2 minvalue
#1 x5 x6        2
#2 x1 x5        2
#3 x1 x6        2
#4 x1 x7        2
#5 x5 x6        2
#6 x5 x7        2
#7 x6 x7        3

Or we can use data.table, grouped by 'Var2', do the combn as earlier

library(data.table)
setDT(df1)[, {
         x1 <- combn(Var1, 2, FUN = list)
         data.frame(do.call(rbind, x1), 
              minvalue = sapply(x1, function(y) min(value[Var1 %in% y])),
             stringsAsFactors= FALSE)
           }, by = Var2]

data

df1 <- structure(list(Var1 = c("x5", "x6", "x1", "x5", "x6", "x7"), 
Var2 = c("x1", "x1", "x2", "x2", "x2", "x2"), value = c(2L, 
6L, 2L, 2L, 4L, 3L)), .Names = c("Var1", "Var2", "value"),
class = "data.frame", row.names = c(NA, -6L))

Thank's @akrun, great solutions. I tested both of them on a bigger dataset and working well. Thank's again. — lgadar, Nov 01 '17 at 13:48

score 1 · Answer 2 · answered Nov 01 '17 at 14:38

There is an alternative solution using data.table's non-equi join:

library(data.table)
setDT(df1)[, rn := .I][
  df1, on = .(Var2, rn < rn), nomatch = 0L, allow = TRUE,
  .(Var3 = Var1, Var4 = i.Var1, minvalue = pmin(value, i.value))]

   Var3 Var4 minvalue
1:   x5   x6        2
2:   x1   x5        2
3:   x1   x6        2
4:   x5   x6        2
5:   x1   x7        2
6:   x5   x7        2
7:   x6   x7        3

The join on Var2 is equivalent to grouping by Var2 while the join condition rn < rn on the row number replaces combn().

I wonder how this code compares to akrun's solutions in terms of speed if applied to a bigger data set.

Data mutate in R

2 Answers2

data