1

I have a dataset with answers to a likert scale and reaction times that are both results of a experimental manipulation. Ideally I would like to copy the Likert_Answer values and align them to the experimental manipulation associated with that value.

The dataset looks like this:

x <- rep(c(NA, round(runif(5, min=0, max=100), 2)), times=3)

myDF <- data.frame(ID = rep(c(1,2,3), each=6),
               Condition = rep(c("A","B"), each=3, times=3),
               Type_of_Task = rep(c("Test", rep(c("Experiment"), times=2)), times=6),
               Likert_Answer = c(5, NA, NA, 6, NA, NA, 1, NA, NA, 5, NA, NA, 5, NA, NA, 1, NA, NA),
               Reaction_Times = x)

I find it very hard to formulate the problem I have, so this is how my expected output should look like:

myDF_Output <- data.frame(ID = rep(c(1,2,3), each=6),
               Condition = rep(c("A","B"), each=3, times=3),
               Type_of_Task = rep(c("Test", rep(c("Experiment"), times=2)), times=6),
               Likert_Answer = rep(c(5, 6, 1, 5, 5, 1), each = 3),
               Reaction_Times = x)

I have seen in this post a feasible solution that is the following:

library(dplyr)
library(tidyr)

myDF2 <- myDF %>% 
  group_by(ID) %>% 
  fill(Likert_Answer) %>% 
  fill(Likert_Answer, .direction = "up")

The problem is that this solution is valid as far as a person replies to the likert scale. If that was not the case, I am afraid this solution would "drag" the result of the likert scale of the previous one experimental condition. For example:

myDF_missing <- myDF
myDF_missing[4,4] = NA

myDF3 <- myDF_missing %>% 
  group_by(ID) %>% 
  fill(Likert_Answer) %>% 
  fill(Likert_Answer, .direction = "up")

In this case, what should have been a NA in Likert_Scales for all values in condition B for ID 1 has become a 5. Any idea of how could avoid this?

(Excuse me if the code is dirty: I am quite new to R and I am learning the hard way... But I got pretty stuck with this problem at this stage.)

Lucas
  • 51
  • 6

1 Answers1

2

if I understood your problem correctly you are very close to a solution. I manipulated the demo df to show how the grouping works:

library(dplyr)
library(tidyr)

myDF <- data.frame(ID = rep(c(1,2,3), each=6),
                   Condition = rep(c("A","B"), each=3, times=3),
                   Type_of_Task = rep(c("Test", rep(c("Experiment"), times=5)), times=3),
                   Likert_Answer = c(5, NA, NA, 6, NA, NA, 1, NA, NA, 5, NA, NA, NA, NA, NA, 1, NA, NA),
                   Reaction_Times = x)


myDF %>% 
  dplyr::group_by(ID) %>% 
  tidyr::fill(Likert_Answer)

      ID Condition Type_of_Task Likert_Answer Reaction_Times
   <dbl> <chr>     <chr>                <dbl>          <dbl>
 1     1 A         Test                     5           NA  
 2     1 A         Experiment               5           18.4
 3     1 A         Experiment               5           41.1
 4     1 B         Experiment               6           59.8
 5     1 B         Experiment               6           93.4
 6     1 B         Experiment               6           38.5
 7     2 A         Test                     1           NA  
 8     2 A         Experiment               1           18.4
 9     2 A         Experiment               1           41.1
10     2 B         Experiment               5           59.8
11     2 B         Experiment               5           93.4
12     2 B         Experiment               5           38.5
13     3 A         Test                    NA           NA  
14     3 A         Experiment              NA           18.4
15     3 A         Experiment              NA           41.1
16     3 B         Experiment               1           59.8
17     3 B         Experiment               1           93.4
18     3 B         Experiment               1           38.5
DPH
  • 4,244
  • 1
  • 8
  • 18
  • With your reply I just realised that I did something wrong in my code in order to explain my problem! The first cell in Type_of_Task that corresponds to condition B should be "Test" and not "Experiment". I will edit and try to make myself clear. Sorry about that! – Lucas Nov 20 '20 at 13:59
  • I fixed the code and edited the question now. I hope that now my problem is more clear to understand. Basicaly, I want the values in the column to be filled for the same ID, but also for the same Condition. This way, if for example participant ID1 does not answer the likert scale for condition B this result remains a NA and does not get filled with the result of the test for condition A. – Lucas Nov 20 '20 at 14:08
  • @Lucas: you have to use two variable for grouping: myDF_missing %>% dplyr::group_by(ID, Condition) %>% tidyr::fill(Likert_Answer) – DPH Nov 22 '20 at 22:33