I have repeating dates, a column for returns, and then many other columns of other variables that I am trying to sort on and then extract the average return of the top quartile of the sorted variable constituents for each date. There are NA's and I want to ignore the NA's for each column individually when sorting.
The initial data looks like this
date return a b c d
2/1/18 5 3 5 4 9
2/1/18 6 2 1 NA 7
2/1/18 5 NA 6 NA 5
2/1/18 NA 1 NA 2 NA
2/1/18 NA NA NA 1 NA
2/2/18 NA NA 2 NA NA
2/2/18 4 10 4 6 NA
2/2/18 7 5 NA 2 NA
2/2/18 8 7 7 9 NA
2/3/18 NA 2 NA NA NA
2/3/18 3 NA 6 5 8
2/3/18 6 5 2 4 4
2/3/18 5 8 8 1 9
2/4/18 6 8 6 3 1
2/4/18 5 2 5 9 10
2/4/18 7 4 2 10 8
I want the end data to come out as follows
date high a return high b return high c return high d return
2/1/18 5 5 5 5
2/2/18 4 8 8 NA
2/3/18 8 5 3 5
2/4/18 6 6 7 5
I am trying to switch my code from performing the following a bunch of times for different variables.
High = df[!is.na(df$a),] %>%
group_by(date) %>%
filter(a > quantile(a, .666)) %>%
summarise(high_return = mean(return))
I have switched to this code but have been unable to replicate the way I dealt with na's in the previous code (i.e. the !is.na).
list <- c("a", "b", "c", "d")
High <- df %>%
group_by(date) %>%
summarize_at(vars(one_of(list)),
funs(HighReturn = mean(return[na.omit(.) > quantile((.), .666, na.rm = TRUE)]))
This does no produce the same results as doing the columns one by one. I have also tried removing the na.omit, adding na.omit to the right side "." and other combinations. Is there a way to produce !is.na for each column as it passes through the function?