Is there a tidyverse approach to find paired rows in dataframe

Question

I have

df <-   data.frame(id = c(c(letters[1:4]), c(LETTERS[5:8])),
                     group = c(rep("same", length = 4), rep("opp", length = 4)),            
                     match = c("H", "G", "E", "F", "c", "d", "b", "a"))

where each id (row) is uniquely paired with one other id in the list. Hoping to find tidyverse solution to create an indicator column showing unique pairings using sequential numbers assigned to each pair. So the result would be the output of

df <-   data.frame(pair = c(1,2,3,4,3,4,2,1), id = c(c(letters[1:4]), c(LETTERS[5:8])),
                     group = c(rep("same", length = 4), rep("opp", length = 4)),            
                     match = c("H", "G", "E", "F", "c", "d", "b", "a"))

Onyambu · Accepted Answer · 2023-07-26T21:38:10.423

In case of one to one relationship

df %>% group_by(pair = pmin(id, match))%>% mutate(pair = cur_group_id())

# A tibble: 8 × 4
# Groups:   pair [4]
  id    group match  pair
  <chr> <chr> <chr> <int>
1 a     same  H         1
2 b     same  G         2
3 c     same  E         3
4 d     same  F         4
5 E     opp   c         3
6 F     opp   d         4
7 G     opp   b         2
8 H     opp   a         1

or

mutate(df, pair = match(pmin(id, match), id))

in Base R:

transform(df, pair = match(pmin(id,match), id))
  id group match pair
1  a  same     H    1
2  b  same     G    2
3  c  same     E    3
4  d  same     F    4
5  E   opp     c    3
6  F   opp     d    4
7  G   opp     b    2
8  H   opp     a    1

In case of one to many relationship:

df %>%
  group_by(pair= paste(pmin(id, match), pmax(id, match)))%>%
  mutate(pair =cur_group_id())

# A tibble: 8 × 4
# Groups:   pair [4]
  id    group match  pair
  <chr> <chr> <chr> <int>
1 a     same  H         1
2 b     same  G         2
3 c     same  E         3
4 d     same  F         4
5 E     opp   c         3
6 F     opp   d         4
7 G     opp   b         2
8 H     opp   a         1

@jpsmith just restricted to one to one relationship. I have editted it in case of a one to many relationship — Onyambu, Jul 26 '23 at 21:39
Great solution and so simple. thanks for the opportunity to learn about pmin/pmax and cur_group_id — marcel, Jul 26 '23 at 22:10

jpsmith · Answer 2 · 2023-07-26T21:35:52.997

There may be more elegant ways, but you can achieve your desired output in two mutate steps - the first one creates a temporary grouping variable (tmp) that is later removed with select, and the other uses cur_group_id to assign group numbers based on tmp:

df %>%
  mutate(tmp = ifelse(group %in% "same", paste0(id, match), paste0(match, id))) %>%
  mutate(pair = cur_group_id(), .by = tmp) %>% select(-tmp)

Output

  id group match pair
1  a  same     H    1
2  b  same     G    2
3  c  same     E    3
4  d  same     F    4
5  E   opp     c    3
6  F   opp     d    4
7  G   opp     b    2
8  H   opp     a    1

score 0 · Answer 3 · answered Jul 26 '23 at 21:41

0

library(tidyverse)

df |>
  group_by(pair = map2_chr(id, match, ~ str_flatten(sort(c(.x, .y))))) |>
  mutate(pair = cur_group_id()) |>
  ungroup()

answered Jul 26 '23 at 21:41

LMc

12,577
3
31
43

Andre Wildberg · Answer 4 · 2023-07-26T22:27:56.960

An approach using factor

df %>% 
  rowwise() %>% 
  mutate(pair = factor(paste(sort(c(id, match)), collapse = "")), 
         pair = as.numeric(pair)) %>% 
  ungroup()
# A tibble: 8 × 4
  id    group match  pair
  <chr> <chr> <chr> <dbl>
1 a     same  H         1
2 b     same  G         2
3 c     same  E         3
4 d     same  F         4
5 E     opp   c         3
6 F     opp   d         4
7 G     opp   b         2
8 H     opp   a         1

Is there a tidyverse approach to find paired rows in dataframe

4 Answers4