-1

I have one issue on calculating the annual average of my data. Here is my data (dt.select):

head(dt.select)    

        DATE      SITE_ID   TSO4   TNO3   TNH4   NSO4
61372 2000-01-07  ALH157 1.9873 3.3794 1.5214 0.2743
61369 2000-01-14  ALH157 1.9451 4.3169 1.4218 0.9190
61376 2000-01-21  ALH157 2.9216 4.7621 2.2883 1.1009
61371 2000-01-28  ALH157 2.9312 5.5041 2.4494 1.9243
67249 2000-02-04  ALH157 3.1832 7.7116 3.1407 0.9057
67244 2000-02-11  ALH157 5.1027 9.7623 4.6308 0.3478

tail(dt.select)

        DATE      SITE_ID   TSO4   TNO3   TNH4   NSO4
49342 2015-11-27  ALH157 1.1826 1.8821 0.7209 0.6908
55291 2015-12-04  ALH157 1.5332 3.9142 1.6252 0.7613
55292 2015-12-11  ALH157 1.5277 1.0478 0.5452 0.6588
55293 2015-12-18  ALH157 0.9947 1.0137 0.4906 0.5088
55294 2015-12-25  ALH157 0.7792 0.6357 0.4327 0.5266
55260 2016-01-01  ALH157 1.4620 1.7782 0.9438 0.4873

It is almost 15 years data. Now, I use code below to get the annual average.

Annual.Ave <- aggregate(dt.select[-2], format(dt.select[1],"%Y"),
                        mean, na.rm = TRUE)

The results is kind of what I wanted, but the column of "SITE_ID" is removed and there is random “DATE” column which I do not needed.

head(Annual.Ave)
   DATE    DATE     TSO4      TNO3      TNH4
1 2000 2000-07-03 4.250796 2.943349 2.057633
2 2001 2001-07-02 3.986159 2.896429 1.949810
3 2002 2002-07-01 4.483027 2.600660 1.985302
4 2003 2003-06-30 3.831227 2.493104 1.786642
5 2004 2004-07-02 3.375330 2.319875 1.556811
6 2005 2005-07-04 4.785614 2.399269 2.089143

Do you know how to fix the code to get the results as below ?

DATE SITE_ID   TSO4   TNO3   TNH4   NSO4
1 2000 ALH157 4.250796 2.943349 2.057633
user5249203
  • 4,436
  • 1
  • 19
  • 45
Lee Yee
  • 17
  • 1
  • 8
  • Since you want the SITE_ID column in, are you trying to get the annual average of the TS/TN/NS columns *per station*? – mathematical.coffee Apr 14 '16 at 23:47
  • Hi, That is data from only one site (ALH157). but I like to keep this column. – Lee Yee Apr 14 '16 at 23:51
  • But if you have multiple SITE_ID in one dataframe, then what should this column show, since it is aggregated over all SITE_ID? (That is why it is dropped. It doesn't make sense to keep it *unless* you want the total per SITE_ID, or you are guaranteed that each dataframe has exactly one SITE_ID in it) – mathematical.coffee Apr 15 '16 at 00:02
  • @LeeYee You cannot take the mean of SITE_ID ... since it's not numeric. If you want `mean`s within categories of SITE_ID then you need to include it along with the year-index as part of the second argument. – IRTFM Apr 15 '16 at 00:19
  • The data was I `subset` from a big dataset. After getting the annual average I will merge it with other data. So it will be nice if I can keep the SITE_ID column – Lee Yee Apr 15 '16 at 00:21
  • Then `cbind` it. Don't expect `aggregate` to do all the work for you. – IRTFM Apr 15 '16 at 00:22

2 Answers2

1

Here is one way to go:

# create year variable
dt.select$year <- format(dt.select[1],"%Y")
# aggregate
Annual.Ave <- aggregate(dt.select[-c(1,2)], df[c("SITE_ID", "year")],
                    mean, na.rm = TRUE)

or following @42-'s suggestion,

    # create year variable
dt.select$year <- format(dt.select[1],"%Y")
# aggregate
Annual.Ave <- aggregate(dt.select[-c(1,2)], df["year"],
                    mean, na.rm = TRUE)
Annual.Ave$site <- "ALH157"
lmo
  • 37,904
  • 9
  • 56
  • 69
  • Thanks a lot @lmo, I get erro message by runing your code `Error: unexpected ',' in "Annual.Ave <- aggregate(dt.select[-(1,"` – Lee Yee Apr 15 '16 at 01:46
1

This will get you a Columnwise mean for your data, for respective years and then leaves the Site_Id as well.

library("plyr")
dates<- format(as.Date(dt.select$DATE), format = "%Y")
# Column wise
ddply(dt.select, .(SITE_ID, dates), summarise,
      mean_TSO4 = mean(TSO4),
      mean_TNO3 = mean(TNO3),
      mean_TNH4 = mean(TNH4),
      mean_NSO4 = mean(NSO4))

output

   SITE_ID dates mean_TSO4 mean_TNO3 mean_TNH4 mean_NSO4
1  ALH157  2000   3.01185  5.906067   2.57540   0.91200
2  ALH157  2015   1.20348  1.698700   0.76292   0.62926
3  ALH157  2016   1.46200  1.778200   0.94380   0.48730

I hope it helps. You can learn more from this plyr package

user5249203
  • 4,436
  • 1
  • 19
  • 45