0

Pandas has proven very successful as a tool for working with time series data. For example to perform a 5 minutes mean you can use the resample function like this :

import pandas as pd
dframe = pd.read_table("test.csv", 
               delimiter=",", index_col=0, parse_dates=True,     date_parser=parse)
## 5  minutes mean
dframe.resample('t', how = 'mean')
## daily mean
ts.resample('D', how='mean')

How can I perform this in R ?

agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 4
    You might attract more help from R folks if you explain (in words) exactly what this Python code does, and what the output is. (Since not everyone who uses R is as familiar with Python as you may be.) – joran Apr 03 '13 at 18:12
  • 3
    whoa, this is actually the first time I see someone asks if/how R can do something he knows in `pandas`, usually it's the other way around. I don't know if Wes should be happy or sad... sorry, out of topic :) – herrfz Apr 03 '13 at 18:42

3 Answers3

3

In R you can use xts package specialised in time series manipulations. For example, you can use the period.apply function like this :

library(xts)
zoo.data <- zoo(rnorm(31)+10,as.Date(13514:13744,origin="1970-01-01"))
ep <- endpoints(zoo.data,'days')
## daily mean 
period.apply(zoo.data, INDEX=ep, FUN=function(x) mean(x))

There some handy wrappers of this function ,

apply.daily(x, FUN, ...)
apply.weekly(x, FUN, ...)
apply.monthly(x, FUN, ...)
apply.quarterly(x, FUN, ...)
apply.yearly(x, FUN, ...)
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • there are also mechanisms for this in `lubridate` but they're not as efficient. – Justin Apr 03 '13 at 18:17
  • @MatthewPlourde it is from the `xts` package. – agstudy Apr 03 '13 at 18:17
  • 1
    Since @BoleckInk seems unwilling to clarify his question, and you seem to have deciphered it, would you be willing to edit his Q so that it actually makes some sense...? – joran Apr 03 '13 at 18:19
  • @agstudy The edit seems fine to me. I suggest you remove the original version (it will still be there in the history) so the question contains only your version. Well done. – Gavin Simpson Apr 03 '13 at 18:37
0

R has data frames (data.frame) and it can also read csv files. Eg.

dframe <- read.csv2("test.csv")

For dates, you may need to specify the columns using the colClasses parameter. See ?read.csv2. For example:

dframe <- read.csv2("test.csv", colClasses=c("POSIXct",NA,NA))

You should then be able to round the date field using round or trunc, which will allow you to break up the data into the desired frequencies.

For example,

dframe$trunc.times <- trunc(dframe$date.field,1,units='mins');
means <- daply(dframe, 'trunc.times', function(df) { return( mean(df$value) ) });

Where value is the name of the field that you want to average.

Jonathan
  • 4,847
  • 3
  • 32
  • 37
0

Personally, I really like a combination of lubridate and zoo aggregate() for these operations:

ts.month.sum <- aggregate(ts.data, month, sum)

ts.daily.mean <- aggregate(ts.data, day, mean)

ts.mins.mean <- aggregate(ts.data, minutes, mean)

You can also use the standard time functions yearmon() or yearqtr(), or custom functions for both split and apply. This method is as syntactically sweet as that of pandas.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33