0

I've been trying to figure this out for a while, but haven't been able to do so. I found a lot of similar questions which didn't help at all.

I have around 43000 records in data frame in R. The date column is in the format "2011-11-15 02:00:01", and the other column is the count. The structure of the data frame:

str(results)
'data.frame':   43070 obs. of  2 variables:
 $ dates: Factor w/ 43070 levels "2011-11-15 02:00:01",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ count: num  1 2 1 1 1 1 2 3 1 2 ...

How can I get the total count per min?

And I also want to convert the results data frame into json. I used rjson package which converted the entire data frame as a single json element. When I inserted into mongodb, there was only on _id for all 43000 records. What did I do wrong?

user1828605
  • 1,723
  • 1
  • 24
  • 63

2 Answers2

2

You can use the xts package to get the counts/minute quite easily.

install.packages("xts")
require("xts")
results_xts <- xts(results$count, order.by = as.POSIXlt(results$dates))

This converts your dataframe to an xts object. There are a bunch of functions (apply.daily, apply.yearly, etc) in xts that apply functions to different time frames, but there isn't one for by minute. Fortunately the code for those functions is super simple, so just run

ep <- endpoints(results_xts, "minutes")
period.apply(results_xts, ep, FUN = sum)

Sorry, I don't know the answer to your other question.

Sean Hughes
  • 328
  • 3
  • 7
1

Asterisk here, untested, but here is my solution for getting the counts per minute, maybe someone will chime in on the json part, I'm not familiar with that

here's my example time series and count

tseq<-seq(now,length.out=130, by="sec")
count<-rep(1, 130)

we find the index of where our minutes switch via the following

mins<-c(0,diff(floor(cumsum(c(0,diff(tseq)))/60)))
indxs<-which(mins%in%1)

Let me break that down (as there are many things nested in there).

  1. First we diff over the time sequence, then add a 0 on the front because we lose an observation with diff
  2. Second, sum the diff-ed vector, giving us the seconds value in each spot (this could probably also be done by a simple format call over the vector of times too)
  3. Third, divide that vector, now the seconds in each spot, by 60 so we get a value in each spot corresponding to the minutes.
  4. Fourth, floor it so we get integers
  5. diff that vector so we get 0's except for 1's where the minute switches
  6. add a 0 to that vector since we lose an observation with the diff
  7. then get the indeces of the 1's with the which call

then we find the start and ends to our minutes

startpoints<-indxs
endpoints<-c(indxs[2:length(indxs)], length(mins))

then we simply sum over the corresponding subset

mapply(function(start, end) sum(count[start:end]), start=startpoints, end=endpoints)
#[1] 61 10

We get 61 for the first point because we include the 0th and 60th second for the first subset

DMT
  • 1,577
  • 10
  • 16