I'm new to programming in R (and for that matter, programming at all...) and I'm trying to do some data analysis for a project for my class. I have some data that looks like this:
Id | Time | Heartrate |
---|---|---|
1341231 | 2016-04-12 07:23:30 | 95 |
1341231 | 2016-04-12 07:23:40 | 101 |
1341231 | 2016-04-12 07:23:50 | 92 |
1341231 | 2016-04-12 07:24:00 | 87 |
2342383 | 2016-04-12 07:23:30 | 60 |
This is data from wearable fitness trackers, in 5 or 10 second intervals. It's a fairly large dataset, with more than 2 million entries. What I would like to do is: for each Id (aka each user), summarize the seconds data by hour, returning the average heart rate for each hour. So I’d like output that looks something like this:
Id | Time | Heartrate |
---|---|---|
1341231 | 2016-04-12 07:00 | 95 |
1341231 | 2016-04-12 08:00 | 82 |
1341231 | 2016-04-12 09:00 | 80 |
1341231 | 2016-04-12 10:00 | 100 |
2342383 | 2016-04-12 07:00 | 65 |
The dates were originally strings, so I parsed them with lubridate.
But after that, things started to go awry.
So, I turn to my best technique: copy-pasting half-understood code.
First, I tried
test_df <- aggregate(Heartrate ~ format(as.POSIXct(sechr$Time), "%m-%d-%y %H"), data=sechr, mean)
but that was no good. As I quickly realized, that dropped the Id out completely, summarizing my data in a more or less useless way.
So, next I tried various formulations of aggregate
, which doesn't seem to take another argument for another variable, and then experimented with summarize
and group_by
, such as below:
testdf3 <- sechr %>% group_by(c(Time ~ format(as.POSIXct(sechr$Time))), "%m-%d-%y %H", Id) %>% summarise(avg_hr=sum(Heartrate))
Needless to say, basically guessing didn't work at all. I produced a lot of errors and several goofy, useless dataframes.
Basically, what I need is a way to say "for each distinct Id, give me the mean of each hour." I think using xts
is the way to go? Maybe? but I'm puzzled about how to do what I'm trying to do.