2

I have data grouped using dplyr in R. I would like to find the 'date' after the last occurrence of observations ('B') equal to or greater than 1 (1, 2, 3 or 4) in each group ('A'). In other words, the 'date' where 1/2/3/4 has turned to 0.

Simply finding the date for the first occurrence of 0 will not work as in some groups 1/2/3/4 switches to 0 and then back again and does not give the result I'd like.

I would like this 'date' for each group to be given in a new column ('date.after').

For example, given the following sample of data, grouped by A (this has been simplified, my data is actually grouped by 3 variables):

A  B  date
a  2  1
a  2  2
a  1  5
a  0  8
b  3  1
b  3  4
b  3  6 
b  0  7 
b  0  9
c  1  2 
c  1  3
c  1  4

I would like to achieve the following:

A  B  date date.after
a  2  1    8
a  2  2    8
a  1  5    8
a  0  8    8
b  3  1    7
b  3  4    7
b  3  6    7
b  0  7    7
b  0  9    7
c  1  2    NA
c  1  3    NA
c  1  4    NA

I hope this makes sense, thank you all very much for your help! This post may look familiar, I have just asked a very similar question:

How to find the last occurrence of a certain observation in grouped data in R?

Community
  • 1
  • 1
Keeley Seymour
  • 281
  • 3
  • 10

2 Answers2

5

Here's a dplyr option:

df %>% group_by(A) %>% mutate(date_after = date[last(which(B >= 1)) + 1])
#Source: local data frame [12 x 4]
#Groups: A [3]
#
#        A     B  date date_after
#   (fctr) (int) (int)      (int)
#1       a     2     1          8
#2       a     2     2          8
#3       a     1     5          8
#4       a     0     8          8
#5       b     3     1          7
#6       b     3     4          7
#7       b     3     6          7
#8       b     0     7          7
#9       b     0     9          7
#10      c     1     2         NA
#11      c     1     3         NA
#12      c     1     4         NA

Alternatively, you could use dplyr's nth function:

df %>% group_by(A) %>% mutate(date_after = nth(date, last(which(B >= 1)) + 1))

What it does (in both cases): It computes the position of the last entry of B equal to or greater than 1, then adds 1 to that index and returns date of that position. It returns NA if that position is not available (as is the case in the last group).

You can do exactly the same in data.table using:

library(data.table)
setDT(df)[, date_after := date[last(which(B >= 1)) + 1], by = A]
talat
  • 68,970
  • 21
  • 126
  • 157
  • Nice, here is an equivalent base R solution `do.call(rbind, lapply(split(data, data$A), function(x){ x$date.after = x$date[tail(which(x$B >= 1),1) + 1]; x}))` – Veerendra Gadekar Feb 26 '16 at 15:33
  • 1
    @VeerendraGadekar, thanks, that's a good suggestion except you would need to replace `last` with `tail(..., 1)` because `last` are functions in `dplyr` and `data.table`. – talat Feb 26 '16 at 15:35
0

I went with dplyr since I think the code is easier to read than data.table

library(dplyr)
df %>%
  group_by(A) %>%
  mutate(
    Date0 = date[B == 0][1]
  )
JackStat
  • 1,593
  • 1
  • 11
  • 17
  • A quote from the Q: "Simply finding the date for the first occurrence of 0 will not work " – talat Feb 26 '16 at 15:25