Finding combined sums of pairs of rows in R

Question

I have a dataframe with test results (rows being Players; and columns Q1 ... Q6 being the different questions). Now I want to find out which pair of players scored the highest sum in total:

# Generating sample data.
n = 6

set.seed(1986)

results_df = data.frame(Player = c("A", "B", "C", "D", "E", "F"), 
                     Q1 = sample(0:1, size = n, replace = TRUE), 
                     Q2 = sample(0:1, size = n, replace = TRUE),
                     Q3 = sample(0:1, size = n, replace = TRUE),
                     Q4 = sample(0:1, size = n, replace = TRUE),
                     Q5 = sample(0:1, size = n, replace = TRUE),
                     Q6 = sample(0:1, size = n , replace = TRUE))


head(results_df)

  Player Q1 Q2 Q3 Q4 Q5 Q6
1      A  1  0  1  0  0  0
2      B  1  1  0  0  0  0
3      C  0  1  0  1  0  1
4      D  0  1  1  0  1  1
5      E  1  1  1  1  1  1
6      F  1  0  0  1  0  1

The 1's and 0's are dummies for whether each player got their question right (1) or wrong (0). Now I would like to combine each pair of players to see how well they would have done it as a pair.

Does anyone know how I can transform the dataframe above to something looking like this below?

(Here I have just summed each combination of pairs by hand: A had 3 right, combined with B who had 3 questions right that A had wrong, would make a combination of 6, and so on...)

  Player  A  B  C  D  E  F
1      A  2  3  5  5  6  4
2      B  3  2  4  5  6  4
3      C  5  4  3  5  6  4
4      D  5  5  5  4  6  6
5      E  6  6  6  6  6  6
6      F  4  4  4  6  6  3

@Onyambu Edited the post now, trying to explain better what I have done. — Methi, Feb 04 '22 at 14:37
B has 2 right that A got wrong not 3. meaning it should be 5 and not 6 — Onyambu, Feb 04 '22 at 14:37
@utubun I'll fix it, but then I need to recalculate the combinations again by hand. — Methi, Feb 04 '22 at 14:37
@Onyambu True. My math skills are off too, not just my R skills ;-) — Methi, Feb 04 '22 at 14:39
@Methi Yes, it what I meant. But don't do that, it won't work anyway for each column. The question is clear: for each pair you want sum of all scores of player `x` vs player `y` — utubun, Feb 04 '22 at 14:43

score 2 · Accepted Answer · answered Feb 04 '22 at 21:53

A base R option with outer

> lst <- asplit(`row.names<-`(as.matrix(results_df[-1]), results_df$Player), 1)

> outer(lst, lst, FUN = Vectorize(function(x, y) sum(x + y > 0)))
  A B C D E F
A 2 3 5 5 6 4
B 3 2 4 5 6 4
C 5 4 3 5 6 4
D 5 5 5 4 6 6
E 6 6 6 6 6 6
F 4 4 4 6 6 3

Onyambu · Answer 2 · 2022-02-04T15:29:06.143

1

in base R you could do:

a <- data.frame(t(as.matrix(results_df[-1])))
b <- combn(a, 2, function(x)sum(x[1] | x[2]))
attributes(b) <- list(Size = ncol(a), Labels = results_df$Player)
d <- as.matrix(structure(b, class = 'dist'))
diag(d) <- colSums(a)
d
  A B C D E F
A 2 3 5 5 6 4
B 3 2 4 5 6 4
C 5 4 3 5 6 4
D 5 5 5 4 6 6
E 6 6 6 6 6 6
F 4 4 4 6 6 3

edited Feb 04 '22 at 15:29

answered Feb 04 '22 at 15:02

Onyambu

67,392
3
24
53

Andre Wildberg · Answer 3 · 2022-02-04T22:39:20.080

1

You can use this to get the sums

n <- 6

# get the combinations
ee <- expand.grid(1:n, 1:n)
matrix(rowSums(
  results_df[,-1][ee[,1],] | results_df[,-1][ee[,2],]), n, 
  dimnames = list(results_df[,1], results_df[,1]))
  A B C D E F
A 2 3 5 5 6 4
B 3 2 4 5 6 4
C 5 4 3 5 6 4
D 5 5 5 4 6 6
E 6 6 6 6 6 6
F 4 4 4 6 6 3

edited Feb 04 '22 at 22:39

answered Feb 04 '22 at 15:06

Andre Wildberg

12,344
3
12
29

There are 6 questions, the maximum a group can get is 6. Your results has numbers greater than 6 – Onyambu Feb 04 '22 at 15:07
@Onyambu I think I misread the question, gonna check, THX! – Andre Wildberg Feb 04 '22 at 15:10
EDIT made changes, should work now – Andre Wildberg Feb 04 '22 at 15:22
1

a comment: You dont have to take it into consideration: `(res <- matrix( rowSums(results_df[,-1][ee[,1],] | results_df[,-1][ee[,2],]), n, dimnames = list(results_df[,1], results_df[,1])))`. In this case, you do not have to use `setNames` and `rownames`. – Onyambu Feb 04 '22 at 15:35

utubun · Answer 4 · 2022-02-04T16:14:20.943

With this input data:

#   Player Q1 Q2 Q3 Q4 Q5 Q6
#1      A  0  1  1  0  1  1
#2      B  0  0  1  0  1  1
#3      C  1  1  0  0  0  0
#4      D  0  1  0  1  1  1
#5      E  1  1  0  1  0  1
#6      F  0  0  0  1  0  1

There is simple base solution:

scr <- rowSums(dat[, -1])                           # 1)
res <- data.frame(outer(scr, scr, '+') - diag(scr)) # 2)
dimnames(res) <- dat[, c(1, 1)]                     # 3)

Find sums of the scores for each row (i.e. Player);
Find sums between each pair of scores obtained in (1). From the diagonal subtract original scores, so e.g. for pair A-A score original number obtained in (1), not a double of it;
Set row- and column-names to player names;

Which gives you this result:

#   A B C D E F
# A 4 7 6 8 8 6
# B 7 3 5 7 7 5
# C 6 5 2 6 6 4
# D 8 7 6 4 8 6
# E 8 7 6 8 4 6
# F 6 5 4 6 6 2

Data:

dat <- structure(
  list(
    Player = c("A", "B", "C", "D", "E", "F"),
    Q1     = c(0, 0, 1, 0, 1, 0),
    Q2     = c(1, 0, 1, 1, 1, 0),
    Q3     = c(1, 1, 0, 0, 0, 0),
    Q4     = c(0, 0, 0, 1, 1, 1),
    Q5     = c(1, 1, 0, 1, 0, 0),
    Q6     = c(1, 1, 0, 1, 1, 1)
  ),
  row.names = c(NA,-6L),
  class = "data.frame"
)

score -1 · Answer 5 · answered Feb 04 '22 at 14:42

So here is the code to compute all player's combined score. I dont know, why you need them in a matrix form but using this you should be able to create the matrix. The solution is to use tidyr::pivot_longer() and afterwards dplyr.

# Generating sample data.
set.seed(1986)

n <- 6
results_df <- data.frame(
  Player = c("A", "B", "C", "D", "E", "F"),
  Q1 = sample(0:1, size = n, replace = TRUE),
  Q2 = sample(0:1, size = n, replace = TRUE),
  Q3 = sample(0:1, size = n, replace = TRUE),
  Q4 = sample(0:1, size = n, replace = TRUE),
  Q5 = sample(0:1, size = n, replace = TRUE),
  Q6 = sample(0:1, size = n, replace = TRUE)
)


results_df
#>   Player Q1 Q2 Q3 Q4 Q5 Q6
#> 1      A  1  0  1  0  0  0
#> 2      B  1  1  0  0  0  0
#> 3      C  0  1  0  1  0  1
#> 4      D  0  1  1  0  1  1
#> 5      E  1  1  1  1  1  1
#> 6      F  1  0  0  1  0  1

results_df |>
  tidyr::pivot_longer(cols = tidyselect::starts_with("Q"), names_to = "question", values_to = "score") |>
  dplyr::group_by(Player) |>
  dplyr::summarise(total = sum(score))
#> # A tibble: 6 x 2
#>   Player total
#>   <chr>  <int>
#> 1 A          2
#> 2 B          2
#> 3 C          3
#> 4 D          4
#> 5 E          6
#> 6 F          3

^{Created on 2022-02-04 by the reprex package (v2.0.1)}

This only sums each players total score. I want to know the combination of points that A and B got together, A and C, A and D, and so forth. — Methi, Feb 04 '22 at 14:47

Finding combined sums of pairs of rows in R

5 Answers5