0

I'm trying to calculate 8-hour rolling means using a ddply + rollingMean command on a pollutant data frame that looks something like this:

df1
date                co      code
2000-01-17 01:00:00 0.97000 42
2000-01-17 02:00:00 0.97000 42
2000-01-17 03:00:00 0.98000 42
2000-01-17 04:00:00 0.98000 42
2000-02-04 08:00:00 0.70000 42
2000-02-04 09:00:00 1.40000 42
2000-02-04 10:00:00 1.51000 42
2000-02-04 11:00:00 1.49000 43
2000-02-04 12:00:00 1.98000 43
2000-02-04 15:00:00 1.61000 43
2000-02-04 16:00:00 1.88000 43
2000-02-04 17:00:00 1.64000 43
2000-02-04 18:00:00 1.62000 43
2000-02-04 19:00:00 2.05000 43`

As you can see, the time series isn't complete (that's why I'm using openair's rollingMean, which treats data according to a "date" column), and there's different station "codes" (that I separated using ddply because rollingMean doesn't work with more than one station).

However, when I use this code:

> pd<-ddply(df1,.(code),function(df){df<-rollingMean(df,pollutant="co",
             width=8,new.name="rolling",data.thresh=75);return(df)})`

The return is:

Error: 'by' is NA

Can anyone help me with this error?
Thanks in advance.

PS: Using a similar "o3" data frame like this:

> head(var2)
date                o3    codigo
2000-01-01 01:00:00 23.25      1
2000-01-01 02:00:00 20.08      1
2000-01-10 16:00:00 63.67      1
2000-01-10 17:00:00 80.64      1
2000-01-10 18:00:00 86.48      1
2000-01-10 19:00:00 61.48      1

and this command:

pd<-ddply(var2,.(codigo),function(df){df<-rollingMean(df,pollutant="o3",
           width=8,new.name="medmov",data.thresh=75);return(df)})

the code works just fine, showing:

> head(pd)
date                o3    codigo  medmov
2000-01-01 01:00:00 23.25      1      NA
2000-01-01 02:00:00 20.08      1      NA
2000-01-01 03:00:00 22.31      1      NA
2000-01-01 04:00:00 23.02      1 22.1650
2000-01-01 05:00:00 12.40      1 20.2120
2000-01-01 06:00:00 11.67      1 16.2575
ccl
  • 11
  • 2
  • could you edit the question to give an example of how you expect the output to look – jalapic Sep 09 '14 at 03:22
  • @Camila Lopes Couldn't reproduce the error, It works fine for me using the example showed. Please use `dput` to show the example dataset that creates the error. For example `dput(head(df1,20)` – akrun Sep 09 '14 at 04:12

1 Answers1

1

Problem solved.

@akrun, my data frame is huge (1490375 obs. and 61 different stations), so I tried to use dput with a subset of it. Realising that with some subsets the command worked, I started to test different sizes to find the exact part of the data that caused the error.
Getting a 100 obs. data frame, I saw that a particular station had a single observation, not only on the subset but on the entire data frame! (a simple summary(df1$code)could find that quickly. My bad)
Excluding this observation, the command worked smoothly.

So probably this type of error occurs when rollingMean can't find enough observations to calculate the rolling mean. I would never guess that.

Anyway, thanks @akrun and @jalapic. :)

ccl
  • 11
  • 2