Duplicated rows after reshaping and joining

Question

I am analyzing the student test data, but before that I want to tidy it. I tried to build a tidy data frame but it seems like the rows are duplicating. Below is my code.

library(tidyverse)
(Test <- tibble(
  Student = c("A", "B", "C", "D", "E"),
  Test1 = c("SAT", "SAT", "SAT", "SAT", "SAT"),
  Test2 = c("NA", "ACT", "ACT", "ACT", "ACT"),
  testdate1 = c("7/1/2017", "6/1/2017", "3/1/2017", "2/17/2018", "NA"),
  testdate2 = c("NA", "NA", "1/1/2016", "12/1/2016", "10/1/2016")
))


(Testa <- tibble(
  Student = c("A", "B", "C", "D", "E"),
  Test1 = c("SAT", "SAT", "SAT", "SAT", "SAT"),
  Test2 = c("NA", "ACT", "ACT", "ACT", "ACT")
))


(Testb <- tibble(
  Student = c("A", "B", "C", "D", "E"),
  testdate1 = c("7/1/2017", "6/1/2017", "3/1/2017", "2/17/2018", "NA"),
  testdate2 = c("NA", "NA", "1/1/2016", "12/1/2016", "10/1/2016")
))

(td1 <- Testa %>% 
    gather(Test1, Test2, key = "Test", value = "Score"))
(td2 <- Testb %>% 
    gather(testdate1, testdate2, key = "Dated", value = "Datev"))
(tidy <- left_join(td1, td2))

Can anyone please help me solve this issue. Below is the image of how I want to see the data.

If the ids are not unique in each of the joining datasets, it would result in duplication — akrun, Oct 29 '18 at 15:37
Similar for `base::merge`: [Why does the result from merge have more rows than original file?](https://stackoverflow.com/questions/24150765/why-does-the-result-from-merge-have-more-rows-than-original-file) with possible solutions. — Henrik, Oct 29 '18 at 15:46
Thank you very much for your replies. I think I figured it out. I am using this tidy %>% distinct(Student, Datev, .keep_all = TRUE) — greekyfreaky, Oct 29 '18 at 15:47

paoloeusebi · Accepted Answer · 2018-10-29T15:57:29.373

0

You need an id for the students and an id for the tests

See if this can help

td1 <- Testa %>% 
     gather(Test1, Test2, key = "Test", value = "Score")

td2 <- Testb %>%
     gather(testdate1, testdate2, key = "Dated", value = "Datev") %>%
     mutate(Test=ifelse(Dated %in% "testdate1", "Test1", "Test2"))
tidy <- left_join(td1, td2)
tidy

edited Oct 29 '18 at 15:57

answered Oct 29 '18 at 15:51

paoloeusebi

1,056
8
19

score 0 · Answer 2 · answered Oct 29 '18 at 16:02

I think the simplest solution is to use stats::reshape which is capable of gathering multiple columns, as suggested in answers to this question:

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

The interface of stats:reshape is not as beautiful as the tidy-interfaces, but it gets the job done in one function call.

Duplicated rows after reshaping and joining

2 Answers2