2

I have two data frames where the data are connected via the "ID" label as follows:

test_frame1<- data.frame(
  ID = c("A","B","C"),
  ElementType1 = c(1,6,1),
  ElementType2= c(4,4,5),
  ElementType3 = c('',6,1),
  Notes = c("Something random","","Something else random")
)

test_frame2<-data.frame(
  ID = c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C"),
  Syllable = c(1,1,2,2,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),
  ElementID = c(1,2,1,2,1,2,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
)

I want to add a new column to test_frame2 to represent the element type, as described by the middle columns in test_frame1. I want these element types to repeat for each different syllable as follows:

desired_frame<-data.frame(
  ID = c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C"),
  Syllable = c(1,1,2,2,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),
  ElementID = c(1,2,1,2,1,2,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
  ElementType= c(1,4,1,4,1,4,6,4,6,6,4,6,6,4,6,1,5,1,1,5,1,1,5,1)
)

I first transposed test_frame1 so that the element types are in the same column, with ID being the column names. Then I tried to separate test_frame2 into a list grouped by ID and Syllable. At this point I hope to merge, cbind, or in some other way paste the values in the element type columns from test_frame1 to the corresponding list item for that ID into test_frame2. It seems like a 'for loop' might help with this, too. I am struggling, though, to find the right function to merge a list and multiple columns based on one part of the list elements title. Any ideas on a simply way to go about this?

Katie S
  • 23
  • 4

2 Answers2

3

You could pivot test_frame1 to long format, and then do a left_join from test_frame2 onto the result:

library(tidyverse)

desired_frame <- test_frame2 %>%
  left_join(test_frame1 %>%
    mutate(across(starts_with("Element"), as.numeric)) %>%
    pivot_longer( starts_with("Element"), names_pattern = "(\\d+)",
               names_to = "ElementID", values_to = "ElementType") %>%
    mutate(ElementID = as.numeric(ElementID)), by = c("ID", "ElementID")) %>%
  select(-Notes)

Resulting in

desired_frame
#>    ID Syllable ElementID ElementType
#> 1   A        1         1           1
#> 2   A        1         2           4
#> 3   A        2         1           1
#> 4   A        2         2           4
#> 5   A        3         1           1
#> 6   A        3         2           4
#> 7   B        1         1           6
#> 8   B        1         2           4
#> 9   B        1         3           6
#> 10  B        2         1           6
#> 11  B        2         2           4
#> 12  B        2         3           6
#> 13  B        3         1           6
#> 14  B        3         2           4
#> 15  B        3         3           6
#> 16  C        1         1           1
#> 17  C        1         2           5
#> 18  C        1         3           1
#> 19  C        2         1           1
#> 20  C        2         2           5
#> 21  C        2         3           1
#> 22  C        3         1           1
#> 23  C        3         2           5
#> 24  C        3         3           1

Created on 2023-08-01 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • 1
    Just an FYI, you can use `names_transform = list(ElementID = as.numeric)` in your `pivot_longer` and then you wouldn't need the last `mutate`. – LMc Aug 01 '23 at 21:54
  • @LMc that's part of the left join, not part of the mutate – Allan Cameron Aug 01 '23 at 21:56
  • Yup I caught that a bit late! – LMc Aug 01 '23 at 21:57
  • 1
    @LMc I guess names_transform is the proper way to do it, but adding a mutate call instead has marginally fewer characters. – Allan Cameron Aug 01 '23 at 21:59
  • Thank you so much, @Allan! The `pivot_longer`piece was key for me there. This does exactly what I want! The only issue was that I accidentally made the test_frame2 a bit different from my actual data. See below how I tweeked your code to come to a new solution. – Katie S Aug 02 '23 at 15:37
0

In case this helps anyone in the future...I realized that I mistakenly made the test data frames above slightly different from my actual data. Here is how my test_frame2 is really set up:

 test_frame2<-data.frame(
      ID = c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","C","C"),
      Syllable = c(1,1,2,2,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),
      ElementNumber = c(1,2,3,4,5,6,1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9)
    )

Thus, the original solution from Allan would not work because there is no matching ElementID column in both dfs. To get around this I added an ElementID column by grouping by ID and syllable, then adding row numbers. After that the rest of the code is the same as Allan's above:

 desired_framex <- test_frame2 %>% 
    group_by(ID,Syllable) %>% 
    mutate(ElementID = row_number()) %>%
      left_join(test_frame1 %>%
                  mutate(across(starts_with("Element"), as.numeric)) %>%
                  pivot_longer( starts_with("Element"), names_pattern = "(\\d+)",
                                names_to = "ElementID", values_to = "ElementType") %>%
                  mutate(ElementID = as.numeric(ElementID)), by = c("ID","ElementID")) %>%
      select(-Notes)
    desired_framex<-as.data.frame(ungroup(desired_framex))
Katie S
  • 23
  • 4