12

I struggle with dates and times in R, but I am hoping this is a fairly basic task.

Here is my dataset:

> str(temp.df)
'data.frame':   74602 obs. of  2 variables:
 $ time : POSIXct, format: "2011-04-09 03:53:20" "2011-04-09 03:53:15" "2011-04-09 03:53:07" "2011-04-09 03:52:39" ...
 $ value: num  1 1 1 1 1 1 1 1 1 1 ...

> head(temp.df$time, n=10)
 [1] "2011-04-09 03:53:20 EDT" "2011-04-09 03:53:15 EDT" "2011-04-09 03:53:07 EDT" "2011-04-09 03:52:39 EDT"
 [5] "2011-04-09 03:52:29 EDT" "2011-04-09 03:51:56 EDT" "2011-04-09 03:51:54 EDT" "2011-04-09 03:51:46 EDT"
 [9] "2011-04-09 03:51:44 EDT" "2011-04-09 03:51:26 EDT"

and for convenience...

> dput(head(temp.df$time, n=10))
structure(c(1302335600, 1302335595, 1302335587, 1302335559, 1302335549, 
1302335516, 1302335514, 1302335506, 1302335504, 1302335486), class = c("POSIXct", 
"POSIXt"), tzone = "")

What I am looking to do:

  • How can I find how many hours are between the min and max date/time?
  • What's the best way to create summaries of my data using 1-hour time buckets?

Any help you can provide will be greatly appreciated

Btibert3
  • 38,798
  • 44
  • 129
  • 168
  • 1
    Look at the (excellent) vignette for package zoo -- it is in there. – Dirk Eddelbuettel Apr 11 '11 at 16:05
  • 1
    Personally, I've found that avoiding time in general is sometimes easier than trying to get it into an R friendly format. I split the date into columns and work with raw numbers instead that refer to day, month, year, hour, minute, second. – Brandon Bertelsen Apr 12 '11 at 05:42

1 Answers1

6

Use the proper time series packages zoo and/or xts. This example is straight from the help pages of aggregate.zoo() which aggregates POSIXct seconds data every 10 minutes

 tt <- seq(10, 2000, 10)
 x <- zoo(tt, structure(tt, class = c("POSIXt", "POSIXct")))
 aggregate(x, time(x) - as.numeric(time(x)) %% 600, mean)

The to.period() function in xts is also a sure winner. There are countless examples here on SO and on the r-sig-finance list.

C.d.
  • 9,932
  • 6
  • 41
  • 51
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Could you please clarify what your mod 600 is doing? The buckets are every hour, this I see with variable x. All that the aggregate line seems to do to x for me data is add 47 seconds to the start and end of each bucket. What's the point of this? – Frikster Jul 27 '15 at 17:36
  • See the comment: `# aggregate POSIXct seconds data every 10 minutes` – Dirk Eddelbuettel Jul 27 '15 at 17:38
  • cut(time(x), breaks="10 mins") is a great way to simplify that second parameter to the aggregate function. There is an example of doing it this way under the documentation for the aggregate function in the zoo package. https://cran.r-project.org/web/packages/zoo/zoo.pdf – JHowIX Jan 15 '16 at 21:19