1

I am trying to reshape my data to wide, trying to prep the data frame to a within-subject analysis. It is currently suitable for a between-subject analysis. The most helpful answer I found so far was: Using the reshape function in R with multiple matching rows

My data looks like this:

SubjID  ITEM SubjGroup Strength Timing Accuracy  RT
1       dance     1      1       200       1    1234
1       dance     1      2       200       0     430  
1       dance     1      1       400       1     450 
1       dance     1      2       400       1     200 
2       dance     2      1       200       0     300
2       dance     2      2       200       0     755
2       dance     2      1       400       1     550 
2       dance     2      2       400       1     520

What I need is to basically remove the Strength column and replace it with an additional RT column, so that RT1 refers to the Value I got under Strength1 and RT2 refers to the value I got under Strength2 and do the same for accuracy. Each item is repeated 4x for each Subject (under two different combinations of Strength+timing), which means that by the end of the reshaping, there will be 2 rows for each verb and for each subject instead of 4 per item and per subject. Like this :

SubjID  ITEM SubjGroup        Timing           RT1   RT2
1       dance     1             200           1234   430
1       dance     1             400            450   200 
2       dance     2             200            300   755
2       dance     2             400            550   520 

While either deleting the Accuracy column (if it makes things easier) or adding a second one.

I tried a simple reshape formula (the same as the one posted by OP in the link above) and it somehow worked on a similar dataframe with the same table headings but there was only one subject ID : reshape(datafr, idvar = c("ï..Subject", "Timing", "Group", "ITEM"), timevar= "TMS", direction = "wide") When I tried it with the data above it did create the right column but it was empty and I got the warning message. I also tried with Pivot but it did not work.
Thanks!

Mary Bay
  • 11
  • 2
  • Could you (a) edit your sample input to make it easy to import in R? `dput()` is easiest, if you have sample data `dput(your_data[1:8, ])` will make a copy/pasteable version of the first 8 rows. It would be good to know if the blanks you show are empty strings, `NA` values, or something else. – Gregor Thomas May 14 '21 at 19:03
  • (b) Could you explicitly show what output you're looking for for that sample input? I don't know what you mean by `Strength1` and `Strength2`, and *"Accuracy will refer to one of the two"* is confusing --which one, how do we know which one? You may need to edit the sample input to make it fully illustrate the problem. – Gregor Thomas May 14 '21 at 19:04
  • Also *"there will be 2 rows for each verb and for each subject instead of 4 per item"* - I see the `SubjID` so I can guess what a subject is. What do you mean by "verb"? – Gregor Thomas May 14 '21 at 19:07
  • @GregorThomas done, thanks! – Mary Bay May 14 '21 at 19:18
  • @GregorThomas I noticed I will either have to duplicate the Accuracy column or add a second one, but I can get rid of it if it makes things easier. – Mary Bay May 14 '21 at 19:21

1 Answers1

1

reshape2

out <- reshape2::dcast(dat, SubjID + ITEM + SubjGroup + Timing ~ Strength, value.var = "RT")
names(out)
# [1] "SubjID"    "ITEM"      "SubjGroup" "Timing"    "1"         "2"        
names(out)[5:6] <- paste0("RT", names(out)[5:6])
out
#   SubjID  ITEM SubjGroup Timing  RT1 RT2
# 1      1 dance         1    200 1234 430
# 2      1 dance         1    400  450 200
# 3      2 dance         2    200  300 755
# 4      2 dance         2    400  550 520

tidyr

tidyr::pivot_wider(dat, c(SubjID, ITEM, SubjGroup, Timing),
                   names_prefix = "RT", names_from = "Strength", values_from = "RT")
# # A tibble: 4 x 6
#   SubjID ITEM  SubjGroup Timing   RT1   RT2
#    <int> <chr>     <int>  <int> <int> <int>
# 1      1 dance         1    200  1234   430
# 2      1 dance         1    400   450   200
# 3      2 dance         2    200   300   755
# 4      2 dance         2    400   550   520

or even

tidyr::pivot_wider(dat, c(SubjID, ITEM, SubjGroup, Timing),
                   names_from = "Strength", values_from = c("Accuracy", "RT"),
                   names_sep = "")
# # A tibble: 4 x 8
#   SubjID ITEM  SubjGroup Timing Accuracy1 Accuracy2   RT1   RT2
#    <int> <chr>     <int>  <int>     <int>     <int> <int> <int>
# 1      1 dance         1    200         1         0  1234   430
# 2      1 dance         1    400         1         1   450   200
# 3      2 dance         2    200         0         0   300   755
# 4      2 dance         2    400         1         1   550   520

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • thanks this was helpful. However, the first option resulted in some missing rows (3 total per item instead of 4, so a row was missing for some participants), and only columns with Accuracy values, the second and third one created the second RT column but it was full of NULL. – Mary Bay May 14 '21 at 20:54
  • Gregor's first suggestion to provide the output from `dput(your_data[1:8, ])` is still the gold-standard for sample data. Please adjust the row subset (`1:8`) so that the sample you give us reproduces the symptoms you describe here in your comment. – r2evans May 14 '21 at 21:12