Carry / use value from previous group

Question

data:

structure(list(id = c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5), 
               ax = c("a", "a", "b", "b", "b", "b", "b", "b", "c", "c", 
                      "d", "d", "e"), time = c(1, 3, 0, 2, 4, 5, 6, 8, 7, 9, 10, 
                                               11, 12)), .Names = c("id", "ax", "time"), class = c("data.table", 
                                                                                                   "data.frame"), row.names = c(NA, -13L))

looks like:

    id ax time
 1:  1  a    1
 2:  1  a    3
 3:  2  b    0
 4:  2  b    2
 5:  2  b    4
 6:  2  b    5
 7:  2  b    6
 8:  2  b    8
 9:  3  c    7
10:  3  c    9
11:  4  d   10
12:  4  d   11
13:  5  e   12

I want to have the max of the previous group next to the actual group:

desired output:

    id ax time newCol
 1:  1  a    1     NA
 2:  1  a    3     NA
 3:  2  b    0      3
 4:  2  b    2      3
 5:  2  b    4      3
 6:  2  b    5      3
 7:  2  b    6      3
 8:  2  b    8      3
 9:  3  c    7      8
10:  3  c    9      8
11:  4  d   10      9
12:  4  d   11      9
13:  5  e   12     11

Is it also possible to have the value of the "previous-previous" grp?

Interessted in baseR, data.table and tidyverse solutions

note:

Can be grouped by EITHER id or ax. The example is a little redundant here.

score 4 · Accepted Answer · answered Apr 10 '18 at 13:37

A data.table solution:

dtt.max <- dtt[, .(max = max(time)), by = ax]
dtt.max[, max.prev := shift(max)]

dtt[dtt.max, newCol := i.max.prev, on = 'ax']
# > dtt
#     id ax time newCol
#  1:  1  a    1     NA
#  2:  1  a    3     NA
#  3:  2  b    0      3
#  4:  2  b    2      3
#  5:  2  b    4      3
#  6:  2  b    5      3
#  7:  2  b    6      3
#  8:  2  b    8      3
#  9:  3  c    7      8
# 10:  3  c    9      8
# 11:  4  d   10      9
# 12:  4  d   11      9
# 13:  5  e   12     11

score 3 · Answer 2 · answered Apr 10 '18 at 13:38

3

data.table solution using id + 1

library(data.table)
merge(d, setDT(d)[, max(time), id + 1], all.x = TRUE)

answered Apr 10 '18 at 13:38

pogibas

27,303
19
84
117

have thought of something similar then shift/roll the id variable. – Andre Elrico Apr 10 '18 at 13:40
@AndreElrico If `id` is numeric then simply adding integer is a straightforward solution. – pogibas Apr 10 '18 at 13:41
1

sure in that case. But could of course be possible that an id is missing etc... anyways good stuff!!! – Andre Elrico Apr 10 '18 at 13:42
somehow i get the following error: `Error in gmax(time) : grpn [13] != length(x) [11] in gmax` – Andre Elrico Apr 10 '18 at 13:47
@AndreElrico I'm using `data.table 1.10.4.3` – pogibas Apr 10 '18 at 13:48
To add it by reference instead of making a new table: `DT[DT[, max(time), by=id+1], on=.(id), v := i.V1]` or `DT[DT[, max(time), by=id][, id := shift(id, type="lead")], on=.(id), v := i.V1]` – Frank Apr 10 '18 at 15:17

score 2 · Answer 3 · answered Apr 10 '18 at 13:35

Here is a dplyr approach. The key here is to group and ungroup when necessary:

df %>% 
  group_by(ax) %>% 
  mutate(new = time[n()]) %>%
  ungroup() %>%
  mutate(new = lag(new)) %>%
  group_by(ax) %>%
  mutate(new = new[1])

# A tibble: 13 x 4
# Groups:   ax [5]
      id ax     time   new
   <dbl> <chr> <dbl> <dbl>
 1    1. a        1.   NA 
 2    1. a        3.   NA 
 3    2. b        0.    3.
 4    2. b        2.    3.
 5    2. b        4.    3.
 6    2. b        5.    3.
 7    2. b        6.    3.
 8    2. b        8.    3.
 9    3. c        7.    8.
10    3. c        9.    8.
11    4. d       10.    9.
12    4. d       11.    9.
13    5. e       12.   11.

score 1 · Answer 4 · answered Apr 10 '18 at 13:37

Assuming id is the same as group:

dfr <- dfr %>% group_by(id) %>% mutate(groupmax = max(time))
dfr$old_group_max <- dfr$groupmax[match(dfr$id - 1, dfr$id)]

The antepenultimate group is left as an exercise :-)

G. Grothendieck · Answer 5 · 2018-04-10T17:57:40.790

1) This uses no packages. It computes the maximum for each group giving Ag and and then lags it giving LagMax. Finally it left joins using merge that back into the original data frame DF:

Ag <- aggregate(time ~ id, DF, max)
LagMax <- transform(Ag, lagmax = c(NA, head(time, -1)), time = NULL)
merge(DF, LagMax, by = "id", all.x = TRUE)

giving:

   id ax time lagmax
1   1  a    1     NA
2   1  a    3     NA
3   2  b    0      3
4   2  b    2      3
5   2  b    4      3
6   2  b    5      3
7   2  b    6      3
8   2  b    8      3
9   3  c    7      8
10  3  c    9      8
11  4  d   10      9
12  4  d   11      9
13  5  e   12     11

2) This sorts time within id so that we know that the maximum is the last value in each id group.

o <- order(factor(DF$id, levels = unique(DF$id)), DF$time)
Time <- DF$time[o]
lagmax <- function(r) if (r[1] == 1) NA else Time[r[1] - 1]
transform(DF, lagmax = ave(seq_along(id), id, FUN = lagmax))

In the question the time values are already sorted within id and if that is known to be the case the above could be shortened to:

lagmax <- function(r) if (r[1] == 1) NA else DF$time[r[1] - 1]
transform(DF, lagmax = ave(seq_along(id), id, FUN = lagmax))

3) This one-liner is a data.table translation of (2):

library(data.table)
DT <- copy(DF) # don't overwrite DF

setDT(DT)[, g:=rleid(id)][, lagmax := DT$time[.I[1]-1], keyby = c("g", "id")]

In the sample data in the question time is sorted within id and if that were known to be the case we could use the following shorter code in place of the last line above

setDT(DT)[, lagmax := DT$time[.I[1]-1], by = id]

Carry / use value from previous group

note:

5 Answers5