3

We have the mtcars dataset and want to compute a function between different cars in groups taking several columns as input.

More specifically:

  1. Group by transmission am and gears gear
  2. Compare all cars within the same group with each other (car1 to car2, car1 to car3, car2 to car3...)
  3. Does the first car have more displacement than the second car? disp1>disp2
  4. AND does the first car have more horsepower than the second car? hp1>hp2

The result should look like this:

# Car1           Car2           result
# Mazda RX4      Mazda RX4 Wag  false
# Mazda RX4      Datsun 710     false 
# Mazda RX4 Wag  Datsun 710     true

In this question is an outer operation by group, but only a single column is used. Is it possible to expand this to several columns? If possible, it would be nice to have the disp1, disp2 etc. columns in the result table.

JaBe
  • 664
  • 1
  • 8
  • 27

2 Answers2

4

This is a bit messy, but you could do the following:

library(dplyr)
library(tidyr)

dat <- mtcars %>%
    tibble::rownames_to_column(var = "Car1") %>%
    mutate(Car2 = Car1) %>%
    group_by(am, gear) %>%
    expand(Car1, Car2) %>%
    left_join(tibble::rownames_to_column(mtcars, var = "Car1"), by = "Car1") %>%
    left_join(tibble::rownames_to_column(mtcars, var = "Car2"), by = "Car2") %>%
    ungroup() %>%
    mutate(result = disp.x > disp.y & hp.x > hp.y) %>%
    select(Car1, Car2, result)

Then we get the expected results:

dat

# A tibble: 330 x 3
   Car1        Car2                result
   <chr>       <chr>               <lgl> 
 1 AMC Javelin AMC Javelin         FALSE 
 2 AMC Javelin Cadillac Fleetwood  FALSE 
 3 AMC Javelin Camaro Z28          FALSE 
 4 AMC Javelin Chrysler Imperial   FALSE 
 5 AMC Javelin Dodge Challenger    FALSE 
 6 AMC Javelin Duster 360          FALSE 
 7 AMC Javelin Hornet 4 Drive      TRUE  
 8 AMC Javelin Hornet Sportabout   FALSE 
 9 AMC Javelin Lincoln Continental FALSE 
10 AMC Javelin Merc 450SE          FALSE 
# … with 320 more rows

dat %>% filter(Car1 == "Mazda RX4")

# A tibble: 8 x 3
  Car1      Car2           result
  <chr>     <chr>          <lgl> 
1 Mazda RX4 Datsun 710     TRUE  
2 Mazda RX4 Fiat 128       TRUE  
3 Mazda RX4 Fiat X1-9      TRUE  
4 Mazda RX4 Honda Civic    TRUE  
5 Mazda RX4 Mazda RX4      FALSE 
6 Mazda RX4 Mazda RX4 Wag  FALSE 
7 Mazda RX4 Toyota Corolla TRUE  
8 Mazda RX4 Volvo 142E     TRUE

(While the Mazda RX4 to Datsun 710 comparison differs from your stated expected results, it is in fact correct:)

mtcars %>%
    tibble::rownames_to_column(var = "Car1") %>%
    filter(Car1 %in% c("Mazda RX4", "Datsun 710"))
        Car1  mpg cyl disp  hp drat   wt  qsec vs am gear carb
1  Mazda RX4 21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
2 Datsun 710 22.8   4  108  93 3.85 2.32 18.61  1  1    4    1
duckmayr
  • 16,303
  • 3
  • 35
  • 53
2

Here's a slightly more succinct (if no less messy) method using outer, as requested, and showing the comparative hp and disp valuess. It also utilizes reshape2::melt:

library(dplyr)

f <- function(x) {
  y <- reshape2::melt(outer(x$disp, x$disp, `>`) & outer(x$hp, x$hp, `>`))
  y <- y[y[[1]] != y[[2]],]
  tibble(Car1 = rownames(x)[y[[1]]], Car2 = rownames(x)[y[[2]]],
         disp1 = x$disp[y[[1]]], disp2 = x$disp[y[[2]]],
         hp1 = x$hp[y[[1]]], hp2 = x$hp[y[[2]]], result = y[[3]])
}

result_list <- mtcars %>% split(paste(.$gear, .$am)) %>% lapply(f)

The resulting list has one data frame for each gear / am combo. I've just shown the second one here for brevity:

result_list[2]
$`4 0`
# A tibble: 12 x 7
   Car1      Car2      disp1 disp2   hp1   hp2 result
   <chr>     <chr>     <dbl> <dbl> <dbl> <dbl> <lgl> 
 1 Merc 230  Merc 240D  141.  147.    95    62 FALSE 
 2 Merc 280  Merc 240D  168.  147.   123    62 TRUE  
 3 Merc 280C Merc 240D  168.  147.   123    62 TRUE  
 4 Merc 240D Merc 230   147.  141.    62    95 FALSE 
 5 Merc 280  Merc 230   168.  141.   123    95 TRUE  
 6 Merc 280C Merc 230   168.  141.   123    95 TRUE  
 7 Merc 240D Merc 280   147.  168.    62   123 FALSE 
 8 Merc 230  Merc 280   141.  168.    95   123 FALSE 
 9 Merc 280C Merc 280   168.  168.   123   123 FALSE 
10 Merc 240D Merc 280C  147.  168.    62   123 FALSE 
11 Merc 230  Merc 280C  141.  168.    95   123 FALSE 
12 Merc 280  Merc 280C  168.  168.   123   123 FALSE 
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you. This solution worked well for me, without memory problems. With `%>% filter(result>0)` I could filter for only the true results which I need. Could you elaborate on how you could apply the function on the individual combos and how you transformed from matrix to the tibble? – JaBe May 04 '20 at 14:13
  • 1
    The matrix is converted to a data frame using `reshape::melt`, which converts the n x n matrix into long format, of n^2 rows, one for each combination, with the row numbers of the matrix in column 1, the column numbers in column 2, and the matrix entry for that combination in column 3. We can therefore get the first car's names by indexing the row names by column 1, and the second by indexing row names by column 2. We get the values for hp and disp the same way, by indexing mtcars$hp and mtcars$hp. The result is just the result of the final column in `melt`'s output. – Allan Cameron May 04 '20 at 14:33
  • The function `outer(n, m, f)` creates a length(m) x length(n) matrix where each entry is the result of `f(m[i], n[j])`, where `i` is the row of the resultant matrix, and `j` is the column. In this case, the function `f` was the operator `<`. Since there were two matrices created by the two `outer` functions, these were combined into a single matrix by the boolean operator `&`. – Allan Cameron May 04 '20 at 14:42