1

I have a chunk of POSIXlt times in a data frame, and I'm trying to see how many occurrences of these observances (in this case, bike rides) I have per day. What's the best way to do that?

The dates look like this:

> rides$start.fmtd[1:25]
 [1] "2014-01-01 00:06:00" "2014-01-01 00:11:00" "2014-01-01 00:12:00"
 [4] "2014-01-01 00:14:00" "2014-01-01 00:15:00" "2014-01-01 00:16:00"
 [7] "2014-01-01 00:16:00" "2014-01-01 00:19:00" "2014-01-01 00:20:00"
[10] "2014-01-01 00:20:00"

dput(head()) gives me this:

> dput(head(rides$start.fmtd))
structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(6L, 11L, 12L, 
14L, 15L, 16L), hour = c(0L, 0L, 0L, 0L, 0L, 0L), mday = c(1L, 
1L, 1L, 1L, 1L, 1L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(114L, 
114L, 114L, 114L, 114L, 114L), wday = c(3L, 3L, 3L, 3L, 3L, 3L
), yday = c(0L, 0L, 0L, 0L, 0L, 0L), isdst = c(0L, 0L, 0L, 0L, 
0L, 0L)), .Names = c("sec", "min", "hour", "mday", "mon", "year", 
"wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"))

This specific frame has about 300,000 observances (It's the capitol bikeshare dataset, which contains every bike ride taken in the system, packaged quarterly).

Roger Filmyer
  • 676
  • 1
  • 8
  • 24
  • 2
    Can you provide a sample of your data by posting the output of `dput(head(yourDataFrame))`? `table(as.Date(yourDataFrame$posixLtVariable))` should work? – Jake Burkhead Jun 16 '14 at 02:28
  • `table(as.date(frame$column))` works! But I have about 300,000 observations in the frame, so I can't get `dput()` to spit out a reasonable amount of data. – Roger Filmyer Jun 16 '14 at 03:03
  • @JakeBurkhead make that an answer. `as.Date()` lets me keep dates as the table labels, while `frame$yday` doesn't let me do it as easily. – Roger Filmyer Jun 16 '14 at 03:34

3 Answers3

2

POSIXlt has a yday attribute, and you can use this to do a count, using aggregate or by or table or such.

For example, suppose you have a count of observances of day in count in a data frame d, with column date. If your data does not span more than one year, you can use yday alone:

aggregate(count ~ date$yday, data=d, FUN=sum)

If it spans more than one year (or just to be safe) you can also include the year (with any multiplier greater than 366):

aggregate(count ~ I(1000*date$year + date$yday), data=d, FUN=sum)
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
2
dates <- as.POSIXlt(runif(10, 0, 60 * 60 * 24 * 7), origin = Sys.Date())
dates
## [1] "2014-06-16 03:36:13 PDT" "2014-06-15 22:39:41 PDT"
## [3] "2014-06-19 12:25:11 PDT" "2014-06-17 09:31:45 PDT"
## [5] "2014-06-20 02:20:00 PDT" "2014-06-18 04:36:48 PDT"
## [7] "2014-06-19 17:33:35 PDT" "2014-06-21 15:38:24 PDT"
## [9] "2014-06-17 08:50:45 PDT" "2014-06-20 03:36:38 PDT"

class(dates)
## [1] "POSIXlt" "POSIXt"

table(as.Date(dates))
## 2014-06-15 2014-06-16 2014-06-17 2014-06-18 2014-06-19 2014-06-20 2014-06-21
##          1          1          2          1          2          2          1
Jake Burkhead
  • 6,435
  • 2
  • 21
  • 32
1

If you have values with dates and times, you can format them to just have the date and use table() on those values to get counts.

#sample data
set.seed(15)
randomdates <- structure(runif(30, 1357016400, 1359608400), 
    class=c("POSIXct", "POSIXt"), tzone="")

Now count values per date

table(strftime(randomdates, "%Y-%m-%d"))

The only downside to this is that table() turns the dates to character vectors. You can convert them back with

tbl<-table(strftime(randomdates, "%Y-%m-%d"))
as.POSIXct(names(tbl))
MrFlick
  • 195,160
  • 17
  • 277
  • 295