2

I have data in which subjects completed multiple ratings per day over 6-7 days. The number of ratings per day varies. The data set includes subject ID, date, and the ratings. I would like to create a new variable that recodes the dates for each subject into "study day" --- so 1 for first day of ratings, 2 for second day of ratings, etc.

For example, I would like to take this:

id  Date    Rating
1   10/20/2018  2
1   10/20/2018  3
1   10/20/2018  5
1   10/21/2018  1
1   10/21/2018  7
1   10/21/2018  9
1   10/22/2018  4
1   10/22/2018  5
1   10/22/2018  9
2   11/15/2018  1
2   11/15/2018  3
2   11/15/2018  4
2   11/16/2018  3
2   11/16/2018  1
2   11/17/2018  0
2   11/17/2018  2
2   11/17/2018  9

And end up with this:

id  Day Date    Rating
1   1   10/20/2018  2
1   1   10/20/2018  3
1   1   10/20/2018  5
1   2   10/21/2018  1
1   2   10/21/2018  7
1   2   10/21/2018  9
1   3   10/22/2018  4
1   3   10/22/2018  5
1   3   10/22/2018  9
2   1   11/15/2018  1
2   1   11/15/2018  3
2   1   11/15/2018  4
2   2   11/16/2018  3
2   2   11/16/2018  1
2   3   11/17/2018  0
2   3   11/17/2018  2
2   3   11/17/2018  9

I was going to look into setting up some kind of loop, but I thought it would be worth asking if there is a more efficient way to pull this off. Are there any functions that would allow me to automate this sort of thing? Thanks very much for any suggestions.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

3

Since you want to reset the count after every id , makes this question a bit different.

Using only base R, we can split the Date based on id and then create a count of each distinct group.

df$Day <- unlist(sapply(split(df$Date, df$id), function(x) match(x,unique(x))))


df
#   id       Date Rating Day
#1   1 10/20/2018      2   1
#2   1 10/20/2018      3   1
#3   1 10/20/2018      5   1
#4   1 10/21/2018      1   2
#5   1 10/21/2018      7   2
#6   1 10/21/2018      9   2
#7   1 10/22/2018      4   3
#8   1 10/22/2018      5   3
#9   1 10/22/2018      9   3
#10  2 11/15/2018      1   1
#11  2 11/15/2018      3   1
#12  2 11/15/2018      4   1
#13  2 11/16/2018      3   2
#14  2 11/16/2018      1   2
#15  2 11/17/2018      0   3
#16  2 11/17/2018      2   3
#17  2 11/17/2018      9   3

I don't know how I missed this but thanks to @thelatemail who reminded that this is basically the same as

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(Day = match(Date, unique(Date)))

AND

df$Day <- as.numeric(with(df, ave(Date, id, FUN = function(x) match(x, unique(x)))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • That is perfect! Thanks so much for your help. – Steve Wilson Dec 17 '18 at 01:56
  • `split(df$Date, df$id, df$Date)` seems odd to me - I don't think you can just keep passing more arguments to `...` to split by. `split(df$Date, list(df$id, df$Date))` gives different results for instance. – thelatemail Dec 17 '18 at 02:33
  • @thelatemail ohh..wait! You are right. I just need to `split` it by `id` no need of `Date`. It ignores `df$Date` here anyway I think. I'll update the answer. Thanks. – Ronak Shah Dec 17 '18 at 02:39
  • Just struck me too - `sapply + split` is `tapply`, so `unlist(tapply(df$Date, df$id, FUN=function(x) match(x,unique(x))))` would do it too if the id's are always in order. Also, `ave` could do it by the same logic - `ave(as.numeric(df$Date), df$id, FUN=function(x) match(x,unique(x)))` – thelatemail Dec 17 '18 at 02:47
  • yes, if ids are in order then those solution would work and it also means that this should be a dupe then? There should be answers giving that kind of output. – Ronak Shah Dec 17 '18 at 02:53
  • Many thanks for the additional comments and options, everyone. It is very helpful to see various alternatives as I try to learn r. And my apologies if I missed another posting addressing this issue. – Steve Wilson Dec 17 '18 at 17:37
1

If you want a slightly hacky dplyr version....you can use the date column and convert it to a numeric date then manipulate that number to give the desired result

library(tidyverse)
library(lubridate)

df <- data_frame(id=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2),
                     Date= c('10/20/2018', '10/20/2018', '10/20/2018', '10/21/2018', '10/21/2018', '10/21/2018',
                             '10/22/2018', '10/22/2018', '10/22/2018','11/15/2018', '11/15/2018', '11/15/2018',
                             '11/16/2018', '11/16/2018', '11/17/2018', '11/17/2018', '11/17/2018'), 
                     Rating=c(2,3,5,1,7,9,4,5,9,1,3,4,3,1,0,2,9))

df %>%
  group_by(id) %>%
  mutate(
    Date = mdy(Date),
    Day = as.numeric(Date),
    Day = Day-min(Day)+1)

# A tibble: 17 x 4
# Groups:   id [2]
      id Date       Rating   Day
   <dbl> <date>      <dbl> <dbl>
 1     1 2018-10-20      2     1
 2     1 2018-10-20      3     1
 3     1 2018-10-20      5     1
 4     1 2018-10-21      1     2
 5     1 2018-10-21      7     2
 6     1 2018-10-21      9     2
 7     1 2018-10-22      4     3
 8     1 2018-10-22      5     3
 9     1 2018-10-22      9     3
10     2 2018-11-15      1     1
11     2 2018-11-15      3     1
12     2 2018-11-15      4     1
13     2 2018-11-16      3     2
14     2 2018-11-16      1     2
15     2 2018-11-17      0     3
16     2 2018-11-17      2     3
17     2 2018-11-17      9     3
NColl
  • 757
  • 5
  • 19