Summarizing intervals of missing data in POSIX time series

Question

I have a time series of hourly precipitation data that I am trying to run through a QA/QC routine. One thing I would like to do is create a histogram of the count of intervals with missing data grouped by the length of time the data are missing for, i.e. how many periods are there missing one hour of data, how many with two consecutive hours, how many with three, etc...I could probably do this with some nested loops, but wonder if there is a better way.

The time series is continuous (all hours are represented). datetime is POSIXct, the data are numeric with NA for missing data. A short sample can be created with this:

precip <- structure(list(datetime = structure(c(1114905600, 1114909200, 
1114912800, 1114916400, 1114920000, 1114923600, 1114927200, 1114930800, 
1114934400, 1114938000, 1114941600, 1114945200, 1114948800), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), precip = c(1.1, NA, 2, 0, NA, NA, 
NA, 0, 0, NA, NA, 0.5, 0.3)), .Names = c("datetime", "precip"
), row.names = c(NA, -13L), class = "data.frame")

The output should recognize one one-hour period, one two-hour period, and one three-hour period as missing data. Thanks!

score 1 · Accepted Answer · answered Nov 14 '13 at 22:02

1

Using rle (Run Length Encoding):

R> rle_res <- as.data.frame(unclass(rle(is.na(precip$precip))))
R> rle_na <- subset(rle_res, values==TRUE)  # filter NA values
R> table(rle_na$length)
1 2 3 
1 1 1

answered Nov 14 '13 at 22:02

rcs

67,191
22
172
153

Summarizing intervals of missing data in POSIX time series

1 Answers1