6

I have a time series of continuous data measured at 10 minute intervals for a period of five months. For simplicity's sake, the data is available in two columns as follows:

Timestamp   Temp.Diff
2/14/2011 19:00 -0.385
2/14/2011 19:10 -0.535
2/14/2011 19:20 -0.484
2/14/2011 19:30 -0.409
2/14/2011 19:40 -0.385
2/14/2011 19:50 -0.215

... And it goes on for the next five months. I have parsed the Timestamp column using as.POSIXct.

I want to select rows with certain times of the day, (e.g. from 12 noon to 3 PM), I would like either like to exclude the other hours of the day, OR just extract those 3 hours but still have the data flow sequentially (i.e. in a time series).

Henrik
  • 65,555
  • 14
  • 143
  • 159
lhmv
  • 85
  • 1
  • 6

2 Answers2

6

You seem to know the basic idea, but are just missing the details. As you mentioned, we just transform the Timestamps into POSIX objects then subset.

lubridate Solution

The easiest way is probably with lubridate. First load the package:

library(lubridate)

Next convert the timestamp:

##*m*onth *d*ay *y*ear _ *h*our *m*inute
d = mdy_hm(dd$Timestamp)

Then we select what we want. In this case, I want any dates after 7:30pm (regardless of day):

dd[hour(d) == 19 & minute(d) > 30 | hour(d) >= 20,]

Base R solution

First create an upper limit:

lower = strptime("2/14/2011 19:30","%m/%d/%Y %H:%M")

Next transform the Timestamps in POSIX objects:

d = strptime(dd$Timestamp, "%m/%d/%Y %H:%M")

Finally, a bit of dataframe subsetting:

dd[format(d,"%H:%M") > format(lower,"%H:%M"),]

Thanks to plannapus for this last part


Data for the above example:

dd = read.table(textConnection('Timestamp Temp.Diff
"2/14/2011 19:00" -0.385
"2/14/2011 19:10" -0.535
"2/14/2011 19:20" -0.484
"2/14/2011 19:30" -0.409
"2/14/2011 19:40" -0.385
"2/14/2011 19:50" -0.215'), header=TRUE)
Uwe
  • 41,420
  • 11
  • 90
  • 134
csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • This is great - Thank you so much! If I wish to look at periods crossing the boundary between two separate days, say for instance nightly data from 23:00 hrs to 05:00 hrs, I suppose I can apply the same type of formatting but with two sets of upper and lower limits instead? – lhmv Oct 16 '12 at 03:13
2

You can do this with easily with the time-based subsetting in the xts package. Assuming your data.frame is named Data:

library(xts)
x <- xts(Data$Temp.Diff, Data$Timestamp)
y <- x["T12:00/T15:00"]
# you need the leading zero if the hour is a single digit
z <- x["T09:00/T12:00"]
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418