8

data:

structure(list(id = c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5), 
               ax = c("a", "a", "b", "b", "b", "b", "b", "b", "c", "c", 
                      "d", "d", "e"), time = c(1, 3, 0, 2, 4, 5, 6, 8, 7, 9, 10, 
                                               11, 12)), .Names = c("id", "ax", "time"), class = c("data.table", 
                                                                                                   "data.frame"), row.names = c(NA, -13L))

looks like:

    id ax time
 1:  1  a    1
 2:  1  a    3
 3:  2  b    0
 4:  2  b    2
 5:  2  b    4
 6:  2  b    5
 7:  2  b    6
 8:  2  b    8
 9:  3  c    7
10:  3  c    9
11:  4  d   10
12:  4  d   11
13:  5  e   12

I want to have the max of the previous group next to the actual group:

desired output:

    id ax time newCol
 1:  1  a    1     NA
 2:  1  a    3     NA
 3:  2  b    0      3
 4:  2  b    2      3
 5:  2  b    4      3
 6:  2  b    5      3
 7:  2  b    6      3
 8:  2  b    8      3
 9:  3  c    7      8
10:  3  c    9      8
11:  4  d   10      9
12:  4  d   11      9
13:  5  e   12     11

Is it also possible to have the value of the "previous-previous" grp?

Interessted in baseR, data.table and tidyverse solutions

note:

Can be grouped by EITHER id or ax. The example is a little redundant here.

Community
  • 1
  • 1
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69

5 Answers5

4

A data.table solution:

dtt.max <- dtt[, .(max = max(time)), by = ax]
dtt.max[, max.prev := shift(max)]

dtt[dtt.max, newCol := i.max.prev, on = 'ax']
# > dtt
#     id ax time newCol
#  1:  1  a    1     NA
#  2:  1  a    3     NA
#  3:  2  b    0      3
#  4:  2  b    2      3
#  5:  2  b    4      3
#  6:  2  b    5      3
#  7:  2  b    6      3
#  8:  2  b    8      3
#  9:  3  c    7      8
# 10:  3  c    9      8
# 11:  4  d   10      9
# 12:  4  d   11      9
# 13:  5  e   12     11
mt1022
  • 16,834
  • 5
  • 48
  • 71
3

data.table solution using id + 1

library(data.table)
merge(d, setDT(d)[, max(time), id + 1], all.x = TRUE)
pogibas
  • 27,303
  • 19
  • 84
  • 117
2

Here is a dplyr approach. The key here is to group and ungroup when necessary:

df %>% 
  group_by(ax) %>% 
  mutate(new = time[n()]) %>%
  ungroup() %>%
  mutate(new = lag(new)) %>%
  group_by(ax) %>%
  mutate(new = new[1])

# A tibble: 13 x 4
# Groups:   ax [5]
      id ax     time   new
   <dbl> <chr> <dbl> <dbl>
 1    1. a        1.   NA 
 2    1. a        3.   NA 
 3    2. b        0.    3.
 4    2. b        2.    3.
 5    2. b        4.    3.
 6    2. b        5.    3.
 7    2. b        6.    3.
 8    2. b        8.    3.
 9    3. c        7.    8.
10    3. c        9.    8.
11    4. d       10.    9.
12    4. d       11.    9.
13    5. e       12.   11.
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
1

Assuming id is the same as group:

dfr <- dfr %>% group_by(id) %>% mutate(groupmax = max(time))
dfr$old_group_max <- dfr$groupmax[match(dfr$id - 1, dfr$id)]

The antepenultimate group is left as an exercise :-)

1

1) This uses no packages. It computes the maximum for each group giving Ag and and then lags it giving LagMax. Finally it left joins using merge that back into the original data frame DF:

Ag <- aggregate(time ~ id, DF, max)
LagMax <- transform(Ag, lagmax = c(NA, head(time, -1)), time = NULL)
merge(DF, LagMax, by = "id", all.x = TRUE)

giving:

   id ax time lagmax
1   1  a    1     NA
2   1  a    3     NA
3   2  b    0      3
4   2  b    2      3
5   2  b    4      3
6   2  b    5      3
7   2  b    6      3
8   2  b    8      3
9   3  c    7      8
10  3  c    9      8
11  4  d   10      9
12  4  d   11      9
13  5  e   12     11

2) This sorts time within id so that we know that the maximum is the last value in each id group.

o <- order(factor(DF$id, levels = unique(DF$id)), DF$time)
Time <- DF$time[o]
lagmax <- function(r) if (r[1] == 1) NA else Time[r[1] - 1]
transform(DF, lagmax = ave(seq_along(id), id, FUN = lagmax))

In the question the time values are already sorted within id and if that is known to be the case the above could be shortened to:

lagmax <- function(r) if (r[1] == 1) NA else DF$time[r[1] - 1]
transform(DF, lagmax = ave(seq_along(id), id, FUN = lagmax))

3) This one-liner is a data.table translation of (2):

library(data.table)
DT <- copy(DF) # don't overwrite DF

setDT(DT)[, g:=rleid(id)][, lagmax := DT$time[.I[1]-1], keyby = c("g", "id")]

In the sample data in the question time is sorted within id and if that were known to be the case we could use the following shorter code in place of the last line above

setDT(DT)[, lagmax := DT$time[.I[1]-1], by = id]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341