0

Pretty new to pivot_wider, I'm having trouble converting this table into wider format (reproducible data below):

ABC
  A_gazes_to   A_dur B_gazes_to   B_dur C_gazes_to   C_dur
1       <NA> 1694924          A  672705          A 1214402
2          B  608329       <NA> 1965078          B  178837
3          C  474406          C  126447       <NA> 1384342

This transformation is merely a first start:

library(tidyverse)
ABC %>% 
  pivot_longer(cols = c(A_dur, B_dur, C_dur),
               names_to = "Speaker",
               values_to = "Duration") %>%
  arrange(Speaker) %>%
  mutate(Speaker = sub("(.).*", "\\1", Speaker))
# A tibble: 9 x 5
  A_gazes_to B_gazes_to C_gazes_to Speaker Duration
  <chr>      <chr>      <chr>      <chr>      <int>
1 NA         A          A          A        1694924
2 B          NA         B          A         608329
3 C          C          NA         A         474406
4 NA         A          A          B         672705
5 B          NA         B          B        1965078
6 C          C          NA         B         126447
7 NA         A          A          C        1214402
8 B          NA         B          C         178837
9 C          C          NA         C        1384342

What I don't know how to do is summarize the _gazes_to columns, like this:

Expected result:

# A tibble: 9 x 3
  Speaker Duration Gazes_to
  <chr>      <int> <chr>   
1 A        1694924 NA      
2 A         608329 B       
3 A         474406 C       
4 B         672705 A       
5 B        1965078 NA      
6 B         126447 C       
7 C        1214402 A       
8 C         178837 B       
9 C        1384342 NA 

Reproducible data:

structure(list(A_gazes_to = c(NA, "B", "C"), A_dur = c(1694924L, 
608329L, 474406L), B_gazes_to = c("A", NA, "C"), B_dur = c(672705L, 
1965078L, 126447L), C_gazes_to = c("A", "B", NA), C_dur = c(1214402L, 
178837L, 1384342L)), class = "data.frame", row.names = c(NA, 
3L))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34

1 Answers1

1

You can use pivot_longer with names_pattern :

tidyr::pivot_longer(df, 
                    cols = everything(), 
                    names_to = c('Speaker', '.value'), 
                    names_pattern = '([A-Z]+)_(.*)')

# Speaker gazes_to     dur
#  <chr>   <chr>      <int>
#1 A       NA       1694924
#2 B       A         672705
#3 C       A        1214402
#4 A       B         608329
#5 B       NA       1965078
#6 C       B         178837
#7 A       C         474406
#8 B       C         126447
#9 C       NA       1384342
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213