0

I would like to convert wide data to long data in R, and my data set is for cross-classified models, exploring participants’ response to each target item that has different characteristics.

enter image description here

  • condition is one of the two conditions where participants were assigned to.
  • The participants were tested twice: t1 and t2.
  • As for item-level predictor variables, x1 and x2, are coded.
  • As for response, whether participants’ response to the item was right or wrong was coded.
  • two test formats were administered, test1 and test2.

Although there are so many tutorials for a wide to long conversion, I could not find a one specifically explaining conversion for cross-classified models.

I would like to use tidyverse if possible for the sake of consistency.

My sample data is the following:

structure(list(item_name = c("x1", "x2", "participant_id", "1", 
"2", "3", "4", "5", "6", "7"), participant_variable_1 = c(NA, 
NA, NA, 20, 23, 21, 20, 19, 22, 30), condition = c(NA, NA, NA, 
"A", "B", "A", "B", "A", "B", "A"), t1.item1.test1 = c(1, 3, 
NA, 0, 1, 0, 1, 0, 0, 1), t1.item2.test1 = c(2, 2, NA, 0, 0, 
0, 1, 1, 0, 1), t1.item3.test1 = c(1, 3, NA, 0, 0, 0, 1, 0, 0, 
0), t1.item4.test1 = c(3, 1, NA, 1, 0, 0, 0, 1, 1, 0), t2.item1.test1 = c(1, 
3, NA, 0, 1, 1, 0, 1, 1, 1), t2.item2.test1 = c(2, 2, NA, 1, 
0, 1, 0, 1, 0, 1), t2.item3.test1 = c(1, 3, NA, 0, 0, 0, 1, 0, 
0, 0), t2.item4.test1 = c(3, 1, NA, 1, 1, 0, 1, 1, 1, 0), t1.item1.test2 = c(1, 
3, NA, 0, 1, 0, 1, 0, 0, 1), t1.item2.test2 = c(2, 2, NA, 0, 
0, 0, 1, 1, 0, 1), t1.item3.test2 = c(1, 3, NA, 0, 0, 0, 1, 0, 
0, 0), t1.item4.test2 = c(3, 1, NA, 1, 0, 0, 0, 1, 1, 0), t2.item1.test2 = c(1, 
3, NA, 0, 1, 1, 0, 1, 1, 1), t2.item2.test2 = c(2, 2, NA, 1, 
0, 1, 0, 1, 0, 1), t2.item3.test2 = c(1, 3, NA, 0, 0, 0, 1, 0, 
0, 0), t2.item4.test2 = c(3, 1, NA, 1, 1, 0, 1, 1, 1, 0)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

I would like to have a long data, which looks like the following:

enter image description here

Please and thank you for your guidance!

user8460166
  • 73
  • 1
  • 6
  • 24
  • Is this a widely used data format, or totally custom? If you can link to an explanation of this format that might help. I think it can be done either way though. – Marius Jul 10 '19 at 23:27
  • I'd go back to try to fix whatever is creating your original dataset if at all possible. You essentially have two datasets stuck into one which is making this more complex than it needs to be. – thelatemail Jul 10 '19 at 23:28
  • Thank you for you guys' comments. This is totally custom. I have been working with repeated measures ANOVA type of data but asked to run mixed effects model, so I am still trying to figure out how to code and how to convert this into a long format. I would appreciate if you could provide how I could better code this data too. Thank you very much in advance. – user8460166 Jul 10 '19 at 23:33

1 Answers1

2

This answer requires heavy use of the new pivot_ functions in the dev version of tidyr. You can install that with devtools::install_github("tidyverse/tidyr") if you're willing to run the dev version.

First we split the data into item and participant info - you're not really getting any benefit from storing both in the same table:

item_info = dat[1:2, ]
participant_info = dat[4:nrow(dat), ] %>%
    rename(participant_id = item_name)

Then it's time for a lot of pivoting:

# I have the dev version of tidyr so that is being loaded
library(tidyverse)

item_long = item_info %>%
    select(-participant_variable_1, -condition) %>%
    pivot_longer(
        cols = t1.item1:t2.item4,
        names_to = c("time", "item"),
        names_pattern = "t(\\d)\\.(item\\d)",
    ) %>%
    pivot_wider(names_from = item_name, values_from = value)

participant_long = participant_info %>%
    pivot_longer(
        cols = t1.item1:t2.item4,
        names_to = c("time", "item"),
        names_pattern = "t(\\d)\\.(item\\d)",
        values_to = "response"
    )

combined = participant_long %>%
    left_join(item_long, by = c("item", "time"))

Result:

> combined
# A tibble: 56 x 8
   participant_id participant_variable_1 condition time  item  response    x1    x2
   <chr>                           <dbl> <chr>     <chr> <chr>    <dbl> <dbl> <dbl>
 1 1                                  20 A         1     item1        0     1     3
 2 1                                  20 A         1     item2        0     2     2
 3 1                                  20 A         1     item3        0     1     3
 4 1                                  20 A         1     item4        1     3     1
Marius
  • 58,213
  • 16
  • 107
  • 105
  • I did this before you added the extra wrinkle of the test format, but that shouldn't be too different to what's already being done - it would just need to be added to the `pivot_longer` calls. – Marius Jul 10 '19 at 23:40
  • Thank you very much, @Marius! This is so amazing!! Just before you submitted the answer, I further complicated the dataset by adding `test_format`. If I could bug you one more time, would it be possible for you to teach me how to further sort my data out by distinguishing `test_format` as well? Please and thank you. – user8460166 Jul 10 '19 at 23:44
  • Sorry, I didn't realize your comment before asking further question. I was able to sort out the test format following your code. Thank you very much for your time and help. I hope you have a great day today. :) – user8460166 Jul 11 '19 at 00:06