This problem has me stumped.
I have the following data frame:
library(dplyr)
# approximation of data frame
x <- data.frame(doy = sample(c(seq(200, 300)), 20, replace = T),
year = sample(c("2000", "2005"), 20, replace = T),
phase = sample(c("pre", "post"), 20, replace = T))
and a simple 'summarize' function that takes in the column name as a variable, and works nicely:
getStats <- function(df, col) {
col <- as.name(col)
df %>%
group_by(year, phase) %>%
summarize(n = sum(!is.na(col)),
mean = mean(col, na.rm = T),
sd = sd(col, na.rm = T),
se = sd/sqrt(n))
}
> getStats(x, "doy")
Source: local data frame [4 x 6]
Groups: year [?]
year phase n mean sd se
<fctr> <fctr> <int> <dbl> <dbl> <dbl>
1 2000 post 8 248.625 30.42526 10.75695
2 2000 pre 2 290.000 14.14214 10.00000
3 2005 post 5 231.400 32.86031 14.69558
4 2005 pre 5 274.200 29.79429 13.32441
However, if I modify the function to get the median, it returns an error:
getStats <- function(df, col) {
col <- as.name(col)
df %>%
group_by(year, phase) %>%
summarize(n = sum(!is.na(col)),
mean = mean(col, na.rm = T),
med = median(col, na.rm = T), # new line
sd = sd(col, na.rm = T),
se = sd/sqrt(n))
}
> getStats(x, "doy")
Error in median (doy, na.rm = TRUE): object "doy" not found
I've tried a host of name and position changes, but all yield the same result: 'median' doesn't accept the column name as a passed variable. I assume I'm missing something so basic I'll do a face palm when someone points it out to me, but in the interim I feel like I'm losing my sanity. I appreciate any insights!