2

I am analyzing the student test data, but before that I want to tidy it. I tried to build a tidy data frame but it seems like the rows are duplicating. Below is my code.

library(tidyverse)
(Test <- tibble(
  Student = c("A", "B", "C", "D", "E"),
  Test1 = c("SAT", "SAT", "SAT", "SAT", "SAT"),
  Test2 = c("NA", "ACT", "ACT", "ACT", "ACT"),
  testdate1 = c("7/1/2017", "6/1/2017", "3/1/2017", "2/17/2018", "NA"),
  testdate2 = c("NA", "NA", "1/1/2016", "12/1/2016", "10/1/2016")
))


(Testa <- tibble(
  Student = c("A", "B", "C", "D", "E"),
  Test1 = c("SAT", "SAT", "SAT", "SAT", "SAT"),
  Test2 = c("NA", "ACT", "ACT", "ACT", "ACT")
))


(Testb <- tibble(
  Student = c("A", "B", "C", "D", "E"),
  testdate1 = c("7/1/2017", "6/1/2017", "3/1/2017", "2/17/2018", "NA"),
  testdate2 = c("NA", "NA", "1/1/2016", "12/1/2016", "10/1/2016")
))

(td1 <- Testa %>% 
    gather(Test1, Test2, key = "Test", value = "Score"))
(td2 <- Testb %>% 
    gather(testdate1, testdate2, key = "Dated", value = "Datev"))
(tidy <- left_join(td1, td2))

Can anyone please help me solve this issue. Below is the image of how I want to see the data.

enter image description here

camille
  • 16,432
  • 18
  • 38
  • 60
  • If the ids are not unique in each of the joining datasets, it would result in duplication – akrun Oct 29 '18 at 15:37
  • Similar for `base::merge`: [Why does the result from merge have more rows than original file?](https://stackoverflow.com/questions/24150765/why-does-the-result-from-merge-have-more-rows-than-original-file) with possible solutions. – Henrik Oct 29 '18 at 15:46
  • Thank you very much for your replies. I think I figured it out. I am using this tidy %>% distinct(Student, Datev, .keep_all = TRUE) – greekyfreaky Oct 29 '18 at 15:47

2 Answers2

0

You need an id for the students and an id for the tests

See if this can help

td1 <- Testa %>% 
     gather(Test1, Test2, key = "Test", value = "Score")

td2 <- Testb %>%
     gather(testdate1, testdate2, key = "Dated", value = "Datev") %>%
     mutate(Test=ifelse(Dated %in% "testdate1", "Test1", "Test2"))
tidy <- left_join(td1, td2)
tidy
paoloeusebi
  • 1,056
  • 8
  • 19
0

I think the simplest solution is to use stats::reshape which is capable of gathering multiple columns, as suggested in answers to this question:

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

The interface of stats:reshape is not as beautiful as the tidy-interfaces, but it gets the job done in one function call.

snaut
  • 2,261
  • 18
  • 37