-1

How to check in a column whether dates are one off from each other

Jane Miller
  • 153
  • 9

1 Answers1

2

With tidyverse you can group_by both the id as well as a second id id2 that would group the rows together that are separated by a difference of one day. Then, the number of consecutive days column would include the difference between the last date and the first date (or zero if not the first date).

library(tidyverse)

data %>%
  mutate(date = as.Date(date, format = "%m-%d-%Y")) %>%
  arrange(id, date) %>%
  group_by(id) %>%
  group_by(id2 = cumsum(c(T, diff(date) > 1)), .add = T) %>%
  mutate(num_con_days = ifelse(date == first(date), last(date) - date + 1, 0)) %>%
  ungroup %>%
  select(-id2)

Output

     day    id date       num_con_days
   <dbl> <dbl> <date>            <dbl>
 1     1    10 2021-01-01            1
 2     3    10 2021-01-03            4
 3     4    10 2021-01-04            0
 4     5    10 2021-01-05            0
 5     6    10 2021-01-06            0
 6     1    24 2021-01-01            2
 7     2    24 2021-01-02            0
 8     4    24 2021-01-04            3
 9     5    24 2021-01-05            0
10     6    24 2021-01-06            0

Edit: Using your other example, with renamed column names, you have the following data.frame:

   id        day num_consecutive_days
1   1 2021-01-02                    1
2   2 2021-01-02                    1
3   2 2021-01-05                    2
4   2 2021-01-06                    0
5   2 2021-01-12                    1
6   3 2021-01-01                    2
7   3 2021-01-02                    0
8   3 2021-01-04                    1
9   3 2021-01-11                    1
10  4 2021-01-01                    1

Here, your day is in Year-Month-Day format (so in converting to a Date, you don't need to provide a separate format).

Also, you will need to make sure your column names match and are consistent for day. See below similar code - this should be the same as your desired output.

df %>%
  mutate(day = as.Date(day)) %>%
  arrange(id, day) %>%
  group_by(id) %>%
  group_by(id2 = cumsum(c(T, diff(day) > 1)), .add = T) %>%
  mutate(num_con_days = ifelse(day == first(day), last(day) - day + 1, 0)) %>%
  ungroup %>%
  select(-id2)

Output

   id        day num_consecutive_days
1   1 2021-01-02                    1
2   2 2021-01-02                    1
3   2 2021-01-05                    2
4   2 2021-01-06                    0
5   2 2021-01-12                    1
6   3 2021-01-01                    2
7   3 2021-01-02                    0
8   3 2021-01-04                    1
9   3 2021-01-11                    1
10  4 2021-01-01                    1
Ben
  • 28,684
  • 5
  • 23
  • 45
  • Thank you! could you explain the logic behind the ifelse statement in the mutate function? I am a little confused on how to interpret it – Jane Miller Nov 19 '21 at 02:12
  • The ifelse allows you to check if - within a group - you are mutating the first row in that group. If it is the first row, then the value is set to the number of consecutive dates. If it is not the first row in the group, then the value is zero. This gives the same output as desired, with zeros for consecutive dates (except the first one). – Ben Nov 19 '21 at 02:18
  • I've tried your code again and it seems the output is not the same. I am only getting four columns (day, id, date, num_con_days_2) and it is not counting the consecutive days correctly – Jane Miller Nov 19 '21 at 03:44
  • I edited the answer and added `arrange`. Your post includes `arrange` by `user` (which is not in your example data - did you mean `id`?). It looks like this gives similar output to what your desired output is. One difference is that your desired output is not sorted by `id`. – Ben Nov 19 '21 at 06:16
  • @JaneMiller If you are not getting the same output - what output are you getting? Can you describe it further? If you are "only getting four columns", what columns are you missing? In your desired output in your example, you only have 4 columns of data. – Ben Nov 19 '21 at 06:18
  • @JaneMiller The edited answer also includes a second dataset with `day` instead of `date` - please note the format of your date has changed. – Ben Nov 19 '21 at 06:31