Give group ID for date periods

Question

I'm trying to automatize the attribution of a group number by periods of time. Because I'm writing of function to aggregate time series of weather data by different time periods defined by the user. Let's call "n" the number of sub-periods

d1 = seq(as.Date("1910/1/1"), as.Date("1910/1/20"), "days")
d2 = seq(as.Date("1911/2/4"), as.Date("1911/2/27"), "days")
id1 = rep("1", length(d1))
id2 = rep("2", length(d2))       

df = data.frame(date = c(d1,d2), id = c(id1,id2))
df

I would like to cut my dates into a number "n" of periods and to add the periods number to each row of my data frame: Something like that if I want periods of 4 days:

df$period = c(rep(c(1:4), each = length(d1)/4), rep(c(1:4), each = length(d2)/4))
df

I have different length of date for each ID in my real data set. So it's why I want to build the first groups with the same size and the last one with the rest.

Let's imagine I want fourth periods : I wrote this but this is returning me only "4":

df2 =df %>% 
  group_by(date,id) %>%
  mutate(period = c(rep(seq(1,4-1, by = 1), each = as.integer(length(date)/4)),
                    rep(4, length(date)-((4-1)*as.integer(length(date)/4))))) 
df2

Anyone has an idea ?

@hammoire :

So here for example for the first ID I have 20 dates and if I want to cut it into 3 periods : c(1,1,1,1,1,1 ,2,2,2,2,2,2, 3,3,3,3,3,3,3,3)

Could you show a data frame of your desired output? Just so I'm sure I'm on the right track — hammoire, Apr 09 '20 at 15:00
Create the desired vector of integers by hand, just write out how you want the final 'period' column to look. c(1,1,1,2,2,2,3,3,3) for example. Then paste it into the question. — hammoire, Apr 09 '20 at 15:31
I would like to have a number of period associated with each date, for exemple 1 if the date is in the first "period" of date. But if i ask for exemple for 4 periods and i don't have a multiple of 4 for number of date into each group, i would like to share all the dates into these 4 periods and the last will be constitued by all the rest. For exemple: 21 dates into 4 "periods": 21/4 = 5.25 So the first 3 groups of dates will be constitued of 5 dates and the last group of the 6 lefted — cara mathias, Apr 09 '20 at 15:33

Gregor Thomas · Accepted Answer · 2020-04-09T16:03:53.657

I'd try this:

n_period = 4

df %>% 
  group_by(id) %>% 
  mutate(period = sort(rep_len(1:n_period, length.out = n())))
#          date id period
# 1  1910-01-01  1      1
# 2  1910-01-02  1      1
# 3  1910-01-03  1      1
# 4  1910-01-04  1      1
# 5  1910-01-05  1      1
# 6  1910-01-06  1      2
# 7  1910-01-07  1      2
# 8  1910-01-08  1      2
# 9  1910-01-09  1      2
# 10 1910-01-10  1      2
# 11 1910-01-11  1      3
# 12 1910-01-12  1      3
# 13 1910-01-13  1      3
# 14 1910-01-14  1      3
# 15 1910-01-15  1      3
# 16 1910-01-16  1      4
# 17 1910-01-17  1      4
# 18 1910-01-18  1      4
# 19 1910-01-19  1      4
# 20 1910-01-20  1      4
# ...
# 33 1911-02-16  2      3
# 34 1911-02-17  2      3
# 35 1911-02-18  2      3
# 36 1911-02-19  2      3
# 37 1911-02-20  2      3
# 38 1911-02-21  2      3
# 39 1911-02-22  2      4
# 40 1911-02-23  2      4
# 41 1911-02-24  2      4
# 42 1911-02-25  2      4
# 43 1911-02-26  2      4
# 44 1911-02-27  2      4

Any extras will be assigned to the groups in order, so if you had 7 dates and 4 periods, it would be 1, 1, 2, 2, 3, 3, 4

Alternately, if you want all extras in the last group, e.g., the 4 periods 7-entry case is 1, 2, 3, 4, 4, 4, 4, this should work:

df %>% 
   group_by(id) %>% 
   mutate(period = c(rep(1:n_period, each = n() %/% n_period), rep(n_period, n() %% n_period)))

desval · Answer 2 · 2020-04-09T16:07:01.703

using data.table: (not very elegant but works)

d[, N := .N, by=id]
d[, n := floor(N/4) ]
d[, j := mapply(function(N,n) seq(1, N, by=n) %>% list, N, n)]
d[, y := ifelse(t %in% unlist(j), 1, 0), by=id]
d[, y := cumsum(y), by=id]
d[, c("N","n","j") := NULL]
d

         date id  t y
 1: 1910-01-01  1  1 1
 2: 1910-01-02  1  2 1
 3: 1910-01-03  1  3 1
 4: 1910-01-04  1  4 1
 5: 1910-01-05  1  5 1
 6: 1910-01-06  1  6 2
 7: 1910-01-07  1  7 2
 8: 1910-01-08  1  8 2
 9: 1910-01-09  1  9 2
10: 1910-01-10  1 10 2
11: 1910-01-11  1 11 3
12: 1910-01-12  1 12 3
13: 1910-01-13  1 13 3
14: 1910-01-14  1 14 3
15: 1910-01-15  1 15 3
16: 1910-01-16  1 16 4
17: 1910-01-17  1 17 4
18: 1910-01-18  1 18 4
19: 1910-01-19  1 19 4
20: 1910-01-20  1 20 4
21: 1911-02-04  2  1 1
22: 1911-02-05  2  2 1
23: 1911-02-06  2  3 1
24: 1911-02-07  2  4 1
25: 1911-02-08  2  5 1
26: 1911-02-09  2  6 1
27: 1911-02-10  2  7 2
28: 1911-02-11  2  8 2
29: 1911-02-12  2  9 2
30: 1911-02-13  2 10 2
31: 1911-02-14  2 11 2
32: 1911-02-15  2 12 2
33: 1911-02-16  2 13 3
34: 1911-02-17  2 14 3
35: 1911-02-18  2 15 3
36: 1911-02-19  2 16 3
37: 1911-02-20  2 17 3
38: 1911-02-21  2 18 3
39: 1911-02-22  2 19 4
40: 1911-02-23  2 20 4
41: 1911-02-24  2 21 4
42: 1911-02-25  2 22 4
43: 1911-02-26  2 23 4
44: 1911-02-27  2 24 4
          date id  t y

Thank you but it's not what i want, because here you ask for 4 periods but you have 5 at the end for the first ID — cara mathias, Apr 09 '20 at 15:29
ohh sorry, I thought you need periods of 4, or less if it s the last one — desval, Apr 09 '20 at 15:37

hammoire · Answer 3 · 2020-04-10T14:09:35.453

Not sure if this is what you are after? The function allows you to specify the number of groups, but I'm not sure if you want to automatically define the number of groups for each id. Let me know if this is the case and I can try and modify. Thanks

#n specifies the number of desired groups

group_fun <- function(v, n) {
  len_v <- length(v)
  n_per_group <- floor(length(v)/n)
  output_temp <- sort(rep(1:n, times = n_per_group))
  output <- output_temp[1:len_v]
  output[is.na(output)] <- max(output_temp, na.rm = TRUE)
  output

}

group_fun(df$period[df$id==1], 3)

df %>% 
  group_by(id) %>%
  mutate(period =  group_fun(id, n = 3))

Give group ID for date periods

3 Answers3