How can I create a new column with values assigned based on conditions from rows in two columns?

Question

I'm a beginner in R and am trying to create a new column that will have values determined by values of rows in two columns.

My data frame looks something like this:

df <- data.frame(subjectid = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), subj_day = c("1_1","1_1","1_1","1_2","1_2","2_1","2_1","2_1","2_2","2_2","3_1","3_1","3_1","3_2","3_2"), done_meditation = c(0,0,1,1,1,0,1,1,0,0,1,1,1,1,1))

> df
   subjectid subj_day done_meditation
1          1      1_1               0
2          1      1_1               0
3          1      1_1               1
4          1      1_2               1
5          1      1_2               1
6          2      2_1               0
7          2      2_1               1
8          2      2_1               1
9          2      2_2               0
10         2      2_2               0
11         3      3_1               1
12         3      3_1               1
13         3      3_1               1
14         3      3_2               1
15         3      3_2               1

This data is from a multi-day study, and participants provided multiple responses each day. Each row reflects a response.

The subj_day column shows the participant number and the day of the study for that participant at the time of the response reflected in that row.

The values in 'done_meditation' refer to whether the participant had completed a meditation on that day at the time of their response. If a response was provided after completing a meditation, done_meditation = 1. So, if a participant completed a meditation before any responses on that day, all their done_meditation rows for that day would = 1.

However, if they provided some responses before completing the meditation, and some after, only some of the done_meditation rows would = 1.

I need to create a column that will provide the number of meditations a participant completed across the entire study.

So, the new data frame might look like this:

> df
   subjectid subj_day done_meditation total_meditations
1          1      1_1               0     2
2          1      1_1               0     2
3          1      1_1               1     2
4          1      1_2               1     2
5          1      1_2               1     2
6          2      2_1               0     1
7          2      2_1               1     1
8          2      2_1               1     1
9          2      2_2               0     1
10         2      2_2               0     1
11         3      3_1               1     2
12         3      3_1               1     2
13         3      3_1               1     2
14         3      3_2               1     2
15         3      3_2               1     2

Because participant 2 only completed a meditation on 1 of the 2 study days, they receive a score of 1 in every row.

Because participants 1 and 3 completed a meditation on two days of the study, they receive a score of 2 in every row.

It might be useful to know that the actual study was 7 days long, so participants could have a total_meditations score of up to 7.

I have tried to write a few loops without success and have tried to use the group_by and mutate functions in dplyr but without success. I'm very new to R and have been struggling to figure this one out.

Thanks so much in advance for any help!

UPDATE: I think I've found a solution that works. The below seems to do the job. Still open to any new suggestions if someone has a more elegant solution :)

df2 <- df %>% group_by(subjectid, subj_day) %>%
  summarise(meditationCount = sum(done_meditation)) %>%
  mutate(meditationCount = ifelse(meditationCount >= 1, 1, 0)) %>%
  group_by(subjectid) %>%
  summarise(total_meditations = sum(meditationCount))
 
merged.df <- merge(df,df2, by = "subjectid")

Try this `dplyr` option: `df %>% group_by(subjectid) %>% mutate(total_meditations = if(0 %in% done_meditation) 1 else 2)` — L Tyrone, Apr 12 '23 at 05:41
Thank you! I really appreciate the help. This works for up to 2 meditations but didn't seem to work for more than 2 meditations (which I have in my actual dataset). I think I've found a solution though and will update my post with the code :) — Deanna, Apr 12 '23 at 09:20
The reason my suggested code didn't work is you'd originally only asked for two outcomes: 1 or 2 days. I'm glad I helped get you started and that you found a more scalable solution. And as per the suggestion from @Andy Baxter, feel free to answer your own question with the solution you found. — L Tyrone, Apr 13 '23 at 00:39
Thank you! I realised my original post had left that unclear so I edited it to hopefully make it more clear. I really appreciate it and it definitely helped me find a solution :) — Deanna, Apr 13 '23 at 00:51

Andy Baxter · Accepted Answer · 2023-04-12T11:17:17.560

Good code in constructing your own answer! A few small suggestions for condensed code:

You could use max to get a 1 or 0 dependend on day instead of sum/ifelse
You can get summarise to drop the subj_day group for you using .groups = "drop_last" so no need for a second group_by call.
Joins can be done in pipe so don't need a newly created dataframe (example below using base R pipe but magrittr %>% pipe also works)

df |> 
  group_by(subjectid, subj_day) |> 
  summarise(mediated = max(done_meditation), .groups = "drop_last") |> 
  summarise(total_meditations = sum(mediated)) |> 
  left_join(df, y = _, by = "subjectid")
    # Or use `%>% left_join(df, y = ., ...)` with magrittr pipe

#>    subjectid subj_day done_meditation total_meditations
#> 1          1      1_1               0                 2
#> 2          1      1_1               0                 2
#> 3          1      1_1               1                 2
#> 4          1      1_2               1                 2
#> 5          1      1_2               1                 2
#> 6          2      2_1               0                 1
#> 7          2      2_1               1                 1
#> 8          2      2_1               1                 1
#> 9          2      2_2               0                 1
#> 10         2      2_2               0                 1
#> 11         3      3_1               1                 2
#> 12         3      3_1               1                 2
#> 13         3      3_1               1                 2
#> 14         3      3_2               1                 2
#> 15         3      3_2               1                 2

P.S. On SO you are allowed to answer your own question, so if you're satisfied with your own solution do post it as an answer and you can accept it to give yourself the well-deserved credit!

score 2 · Answer 2 · answered Apr 12 '23 at 13:36

2

Based on your own and Andy's answer, I'd like to add another suggestion:

You could use mutate() instead of summarise() if the results of max() are normalised with the number of responses per group (subj_day) n(). Then the left_join() is no longer needed:

df %>% mutate(total_meditations = max(done_meditation)/n(), .by = subj_day) %>%  
  mutate(total_meditations = sum(total_meditations), .by = subjectid)

answered Apr 12 '23 at 13:36

DrEspresso

211
5

This is a great way around that tricky problem of maxing by day and summing by subject at the same time whilst keeping the whole dataframe! – Andy Baxter Apr 13 '23 at 09:20

How can I create a new column with values assigned based on conditions from rows in two columns?

2 Answers2