1

Me and some fellow students created a qualtrics survey for the course judicial lawmaking. We worked with 4 case vignettes. Each respondent first answered some general questions and then they answered one case. They were first asked whether alimony should be granted and in a second question they were asked how much. Only the ones who answered yes saw this second question. Now we imported the data to R. Since they only answered 1 case and 3 were left open, there are a lot of missing values. I am trying to create a dataset whitout all the unanswered questions? However, i only manage to get all the yes answers. On the other hand i managed to remove the NA, but then it seems like the first question is no longer linked to the second question. (if Q7 was answered yes, the next column should be Q8, but i see the first column says Q7 and the second column says Q12 for example. I will add the code i wrote but i am a law student so my understanding of everything is rather limited. I added a simplified example. The numbers from 1 to 4 represent the 4 different cases.

    age <- c("18-30","18-30","31-45", 60)
YesNo1 <- c("Yes", NA,NA,NA)
Height1 <- c(250,NA,NA,NA)
YesNo2 <- c(NA,"NO",NA,NA)
Height2 <- c(NA,NA,NA,NA)
YesNo3 <- c(NA,NA,"Yes", NA)
Height3 <- c(NA,NA,320,NA)
YesNo4 <- c(NA,NA,NA,"yes")
Height4 <- c(NA,NA,NA, 290)

Test <- data.frame(age, YesNo1, Height1, YesNo2, Height2, 
                  YesNo3, Height3, YesNo4,Height4)


#inspect the data
Test


# reduce the columns 

mi <- pivot_longer(Test, c(YesNo1, YesNo2, YesNo3, YesNo4), 
                         names_to = "decision", values_to = "yes/no")

mi1 <- pivot_longer(mi, c(Height1, Height2, Height3, Height4), 
                    names_to = "alimony", values_to = "height")

#drop the NA rows
mi2 <- mi1 %>% drop_na('yes/no')

In an ideal world i would like to have one dataset with the general questions followed by a column with the number of the yes or no question and the column with the answer. And then a column with the number of the question how much alimony should be granted and a column with the answer. (the numbers of the question should always matchs (7and8, 9and10...) I hope this is clear and someone can help me with it. I translated my problem to a simplified version. when one runs it in R, u can see there is 4 times Yes, and 4 times no. I only want to keep 1 yes and 1 no. But i cant delete the remaining rows with NA in since it will also delete the No answered question. Do you have any idea how i can fix it please?

Michiel
  • 17
  • 4
  • Hi Michiel, code starting with `read.csv2("Data2.csv"` is not reproducible when `Data2.csv` is not given. – Bernhard Apr 11 '22 at 10:55
  • i am afraid i cannot share the document because of privacy reasons – Michiel Apr 11 '22 at 12:51
  • I am just saying you'll increase your chances of a good answer if you provide some reproducible example in form of code that represents your problem and is suitable for providing an answer: https://stackoverflow.com/help/minimal-reproducible-example – Bernhard Apr 11 '22 at 12:59
  • okay i will try and add it! thanks for the tip. – Michiel Apr 11 '22 at 13:10

2 Answers2

0

Apparently you want to use tidyr. I am not fit with the tidyverse so I'd like to show you a approach using standard R and the stack function. Taking your data example

Height1 <- c(250,NA,NA,NA)
YesNo2 <- c(NA,"NO",NA,NA)
Height2 <- c(NA,NA,NA,NA)
YesNo3 <- c(NA,NA,"Yes", NA)
Height3 <- c(NA,NA,320,NA)
YesNo4 <- c(NA,NA,NA,"yes")
Height4 <- c(NA,NA,NA, 290)

Test <- data.frame(age, YesNo1, Height1, YesNo2, Height2, 
                   YesNo3, Height3, YesNo4,Height4)

we can now stack the YesNo columns and the Heightcolumns on top of each other, calling the result stacked:

stacked <- data.frame(age = Test$age,
               yesno = stack(Test, select = c("YesNo1", "YesNo2", "YesNo3", "YesNo4")),
               height = stack(Test, select = c("Height1", "Height2", "Height3", "Height4"))
                )

If you print(stacked) you'll see a lot of NA. So in the next (and final) step, we delete all those columns that have an NA in the yesnocolumn:

stacked <- stacked[!is.na(stacked$yesno.values),]
print(stacked)

And the result is what I understood from your question to be the goal:

> print(stacked)
     age yesno.values yesno.ind height.values height.ind
1  18-30          Yes    YesNo1           250    Height1
6  18-30           NO    YesNo2            NA    Height2
11 31-45          Yes    YesNo3           320    Height3
16    60          yes    YesNo4           290    Height4

Sorry for this not being a tidyverse answer. At least, the No answer was kept in the data.

Bernhard
  • 4,272
  • 1
  • 13
  • 23
  • Thank you so much!!!! this is exactly what i needed, i will try to fit it for my larger dataset. It does not matter it isn't tidyverse as long as it works – Michiel Apr 11 '22 at 15:08
  • I tried applying it to my larger dataset: In the last step however there doesnt happen anything so i have the columns stacked on top of each other, but the NA's do not dissapear – Michiel Apr 11 '22 at 15:25
  • Test <- read.csv2("Data2.csv", header = TRUE, sep = ",") #inspect the data Test #select data Test1 <- Test[,11:24] #NA invullen Test2 <- Test1 Test2[Test2 == ""] <- NA stacked1 <- data.frame(Q1 = Test2$Q1, Q2 = Test2$Q2, Q3 = Test2$Q3, Q4 = Test2$Q4, Q5 = Test2$Q5, Q6 = Test2$Q6, yesno = stack(Test2, select = c("Q7", "Q9", "Q11", "Q13")), height = stack(Test2, select = c("Q8", "Q10", "Q12", "Q14"))) stacked1[stacked1 == ""] <- NA stacked1 <- stacked1[!is.na(stacked$yesno.values),] print(stacked2) – Michiel Apr 11 '22 at 15:27
0

this is your solution applied to my larger dataset @bernhard

Test <- read.csv2("Data2.csv", header = TRUE, sep = ",")
#inspect the data
Test
#select data
Test1 <- Test[,11:24]
#NA invullen
Test2 <- Test1
Test2[Test2 == ""] <- NA

stacked1 <- data.frame(Q1 = Test2$Q1, Q2 = Test2$Q2, Q3 = Test2$Q3,
                       Q4 = Test2$Q4, Q5 = Test2$Q5, Q6 = Test2$Q6,
                      yesno = stack(Test2, select = c("Q7", "Q9", "Q11", "Q13")),
                      height = stack(Test2, select = c("Q8", "Q10", "Q12", "Q14")))
stacked1[stacked1 == ""] <- NA
stacked1 <- stacked1[!is.na(stacked$yesno.values),]
print(stacked2)

As mentionned in my comment the NA's do not dissapear, but they dont give an error either

Michiel
  • 17
  • 4
  • i think i solved it using: stacked2 <- stacked1 %>% drop_na('yesno.values') thank you a thousand times, i have been struggling for several days!!! <3 – Michiel Apr 11 '22 at 15:49
  • Hi! Glad you solved it. Please, do not hide the final solution in a comment, where people are unlikely to find it. You can edit /append your answer or just edit/append your question to make it more visible. Cheers! – Bernhard Apr 12 '22 at 08:16