I'm a beginner in R and am trying to create a new column that will have values determined by values of rows in two columns.
My data frame looks something like this:
df <- data.frame(subjectid = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), subj_day = c("1_1","1_1","1_1","1_2","1_2","2_1","2_1","2_1","2_2","2_2","3_1","3_1","3_1","3_2","3_2"), done_meditation = c(0,0,1,1,1,0,1,1,0,0,1,1,1,1,1))
> df
subjectid subj_day done_meditation
1 1 1_1 0
2 1 1_1 0
3 1 1_1 1
4 1 1_2 1
5 1 1_2 1
6 2 2_1 0
7 2 2_1 1
8 2 2_1 1
9 2 2_2 0
10 2 2_2 0
11 3 3_1 1
12 3 3_1 1
13 3 3_1 1
14 3 3_2 1
15 3 3_2 1
This data is from a multi-day study, and participants provided multiple responses each day. Each row reflects a response.
The subj_day column shows the participant number and the day of the study for that participant at the time of the response reflected in that row.
The values in 'done_meditation' refer to whether the participant had completed a meditation on that day at the time of their response. If a response was provided after completing a meditation, done_meditation = 1. So, if a participant completed a meditation before any responses on that day, all their done_meditation rows for that day would = 1.
However, if they provided some responses before completing the meditation, and some after, only some of the done_meditation rows would = 1.
I need to create a column that will provide the number of meditations a participant completed across the entire study.
So, the new data frame might look like this:
> df
subjectid subj_day done_meditation total_meditations
1 1 1_1 0 2
2 1 1_1 0 2
3 1 1_1 1 2
4 1 1_2 1 2
5 1 1_2 1 2
6 2 2_1 0 1
7 2 2_1 1 1
8 2 2_1 1 1
9 2 2_2 0 1
10 2 2_2 0 1
11 3 3_1 1 2
12 3 3_1 1 2
13 3 3_1 1 2
14 3 3_2 1 2
15 3 3_2 1 2
Because participant 2 only completed a meditation on 1 of the 2 study days, they receive a score of 1 in every row.
Because participants 1 and 3 completed a meditation on two days of the study, they receive a score of 2 in every row.
It might be useful to know that the actual study was 7 days long, so participants could have a total_meditations score of up to 7.
I have tried to write a few loops without success and have tried to use the group_by and mutate functions in dplyr but without success. I'm very new to R and have been struggling to figure this one out.
Thanks so much in advance for any help!
UPDATE: I think I've found a solution that works. The below seems to do the job. Still open to any new suggestions if someone has a more elegant solution :)
df2 <- df %>% group_by(subjectid, subj_day) %>%
summarise(meditationCount = sum(done_meditation)) %>%
mutate(meditationCount = ifelse(meditationCount >= 1, 1, 0)) %>%
group_by(subjectid) %>%
summarise(total_meditations = sum(meditationCount))
merged.df <- merge(df,df2, by = "subjectid")