2

I have a data frame with three variables: Year, Location, and Concentration where I want to aggregate the data by year and location and calculate confidence intervals for concentration.

Year <- rep(c(2010, 2011, 2012, 2013), each=15)
Location <- rep(c("Texas", "Colorado", "Washington"), times = 4, each = 5)
Concentration <- runif(60, 0, 100)

conc_data <- cbind.data.frame(Year, Location, Concentration)
head(conc_data)

  Year Location Concentration
1 2010    Texas      22.54480
2 2010    Texas      70.38605
3 2010    Texas      79.53292
4 2010    Texas      95.62562
5 2010    Texas      38.81795
6 2010 Colorado      68.69821

I have tried using the aggregate function with a custom function for calculating confidence intervals posted by @efbbrown here: How to calculate confidence intervals for a vector?. However it is using all the Concentration data to calculate lower confidence interval instead of the group Concentration information.

aggregate(Concentration ~ Location + Year, data = conc_data, function(x) confidence_interval_lwr(conc_data$Concentration, 0.95))

confidence_interval_lwr <- function(vector, interval) {
  # Standard deviation of sample
  vec_sd <- sd(vector)
  # Sample size
  n <- length(vector)
  # Mean of sample
  vec_mean <- mean(vector)
  # Error according to t distribution
  error <- qt((interval + 1)/2, df = n - 1) * vec_sd / sqrt(n)
  # Confidence interval as a vector
  lwr <- c("lower" = vec_mean - error)
  return(lwr)
}

I would like to get a lower limit of the confidence interval for each year and location as such:

Year   Location  lwr
1 2010      Texas  8.2
2 2010   Colorado  5.9
3 2010 Washington 15.0
4 2011      Texas 10.0
5 2011   Colorado  2.0
6 2011 Washington 18.0
kwh
  • 203
  • 1
  • 8

1 Answers1

0

If we provide anonymous function (function(x)), 'x' returns the 'Concentration'

aggregate(cbind(lwr = Concentration) ~ Location + Year, data = conc_data, 
      function(x) confidence_interval_lwr(x, 0.95))
#  Location Year        lwr
#1    Colorado 2010 13.1289089
#2       Texas 2010 14.3379460
#3  Washington 2010 30.4922382
#4    Colorado 2011 18.9369171
#5       Texas 2011  0.6261571
#6  Washington 2011 12.2817138
#7    Colorado 2012  3.7365737
#8       Texas 2012 11.1165898
#9  Washington 2012 32.9729329
#10   Colorado 2013 23.9445299
#11      Texas 2013  3.0298597
#12 Washington 2013  9.0199863

NOTE: the values would be different as there was no set.seed while creating the runif column

akrun
  • 874,273
  • 37
  • 540
  • 662