How to ignore NAs in certain rows to calculate a group-level 5-year weighted average in R

Question

In the attached data.table, I am trying to calculate a 5-year weighted average of the AF2 column. AF2 is a function of other columns in the dataset based upon the following:

Region  Country Group   Item    Year    Dom.Supply  Feed    Seed
NAm.Oceania Australia   Cereals Wheat   1961    1727    111 421
NAm.Oceania Australia   Cereals Wheat   1962    1807    123 420
NAm.Oceania Australia   Cereals Wheat   1963    1933    166 452
NAm.Oceania Australia   Cereals Wheat   1964    1939    182 444
NAm.Oceania Australia   Cereals Wheat   1965    2016    250 519
NAm.Oceania Australia   Cereals Rye 1961    768 NA  174
NAm.Oceania Australia   Cereals Rye 1962    840 NA  170
NAm.Oceania Australia   Cereals Rye 1963    615 NA  181
NAm.Oceania Australia   Cereals Rye 1964    980 NA  109
NAm.Oceania Australia   Cereals Rye 1965    918 NA  190
NAm.Oceania Canada  Cereals Wheat   1961    3736    1195    1027
NAm.Oceania Canada  Cereals Wheat   1962    3835    1208    1064
NAm.Oceania Canada  Cereals Wheat   1963    4301    1462    1145
NAm.Oceania Canada  Cereals Wheat   1964    3842    1262    1089
NAm.Oceania Canada  Cereals Wheat   1965    4323    1356    1140
NAm.Oceania Canada  Cereals Rye 1961    133 72  15
NAm.Oceania Canada  Cereals Rye 1962    126 61  20
NAm.Oceania Canada  Cereals Rye 1963    163 103 18
NAm.Oceania Canada  Cereals Rye 1964    125 64  20
NAm.Oceania Canada  Cereals Rye 1965    283 201 20

data$AF2 <- 1-((data$Feed + data$Seed) / data$Dom.Supply)

This results in NA values for rows 6-10.

The following code (provided as a result of an earlier question) returns "NaN" - which makes sense because of the NA values.

data2 <- data[, list(
AF2.WA=sum(Dom.Supply*AF2)/sum(Dom.Supply)),
by=Year]

Modifying to remove the NAs, as follows, provides an error message (Error in colMeans(data2[, -1, with = F], na.rm = T) : 'x' must be numeric)

data2 <- data[, list(
AF2.WA=sum(Dom.Supply*AF2)/sum(Dom.Supply), na.rm=T),
by=Year]

The desired output is the following code (ultimately having it in tabular form for all Regions & Groups):

colMeans(data2[,-1, with=F], na.rm=T)
require(plotrix)
std.error(data2[, -1, with=F])

How do I get R to ignore only those rows where the value in AF2 is NA to calculate a total weighted average for the Group "Cereals"? Ultimately, this will run in a loop calculate weighted averages for AF2 across 7 other Groups and 7 other Regions.

Thanks!

`sum(Dom.Supply*AF2)/sum(Dom.Supply), na.rm=T)` doesn't look like a correct syntax. Maybe `sum(Dom.Supply*AF2, na.rm=TRUE)/sum(Dom.Supply, na.rm=TRUE)`? — David Arenburg, May 20 '15 at 11:56
what about the `weighted.mean` function with the `na.rm = TRUE` ? — SabDeM, May 20 '15 at 11:56
Hi David. That change generates a result of 0.4263. However, my result in Excel (as a check) is 0.4839. The difference lies in R summing all Dom.Supply values in all years, whilst in Excel I excluded those values for years where AF2 = NA. Which do you think is the correct approach? — Expat_Canuck, May 20 '15 at 12:01
@SabDeM, weighted.mean gives yet a different value for AF2; 0.4595. I'm not sure which of these three is "accurate" for statistical research purposes. — Expat_Canuck, May 20 '15 at 12:04
How do you want to handle your `NA`s? Does `data$AF2 <- 1-((replace(data$Feed, is.na(data$Feed), 0) + data$Seed) / data$Dom.Supply)` gives you desired result? — David Arenburg, May 20 '15 at 12:05
Hi David, if I could assume that I can replace the NAs with "0" then your replace code works. I guess my real question now is "how to handle the NAs", which is beyond this forum. Thanks all! — Expat_Canuck, May 20 '15 at 12:13

How to ignore NAs in certain rows to calculate a group-level 5-year weighted average in R

0 Answers0