Delete data with gaps

Question

I want to delete data with gaps between the max and min time period corresponding to an individual id. Each Id can start and end in any time period, that is fine. I just want to grab ids that do not have missing time within the max and min time.

library(data.table)
set.seed(5)
data<-data.table(y=rnorm(100))
data[sample(1:100, 40),]<-NA
id = rep(1:10, each = 10)
time = seq(1,10)
data2<-data.frame(id,time)
data2$row<-1:nrow(data2)
data2a<-subset(data2,row<55|row>61 )
data3<-data2a[-sample(nrow(data2a), 5),]
data.table(data3)
count(data3$id)

Here is a good example. Group 1 should be deleted, but not 6 for example.

For future reference, there's never a reason to delete a question or its title because it's answered (that would prevent it from being useful to future searchers). — David Robinson, Aug 13 '15 at 03:16

score 2 · Accepted Answer · answered Aug 13 '15 at 02:52

The condition you want to filter for is that there are no gaps greater than 1. diff(time) gives you the gaps, so all(diff(time) == 1) checks the condition.

You can thus do this with:

library(dplyr)
data3 %>%
    group_by(id) %>%
    filter(all(diff(time) == 1))

In data.table, one solution (that does the same thing) is:

setDT(data3)[, .SD[all(diff(time) == 1)], id]

jeremycg · Answer 2 · 2015-08-13T03:00:48.750

0

using dplyr:

library(dplyr)
data3 %>% group_by(id) %>%
          filter(identical(time, seq(first(time), last(time))))

edited Aug 13 '15 at 03:00

answered Aug 13 '15 at 02:51

jeremycg

24,657
5
63
74

Delete data with gaps

2 Answers2