I´m trying to build a function which would receive: a dataframe (data), variable(s) to group by (groupby), and the name of a dependent variable (var); The function will then: a. create a plot of the means of var, separated by group(s) in groupby. In addition, a nice to have would be adding an anova at the end.
I´ll start with the end: my problem is obviously how to use (string) values in further manipulations in a user defined function.
I unfortunately have problems parsing groupby, which I couldn´t solve after a couple of days trying: I tried using:
!!!rlang::parse_exprs, strsplit, etc...
but with no success. Currently it looks like something like that (that´s the simplified version with less aesthetics..):
grp_comp <- function(data, groupby, var){
data %>%
filter(!is.na(var)) %>%
group_by(!!!rlang::parse_exprs(groupby)) %>%
summarize(n = n(),
mean = mean(!!!rlang::parse_expr(var)),
sd = sd(!!!rlang::parse_expr(var)),
se = sd / sqrt(n)) -> ddata
gg <- unlist(rlang::parse_exprs(groupby))
if(length(as.vector(rlang::parse_exprs(groupby))) == 1){
g <- ggplot(ddata, aes(x = as.character(gg[1]),
y = mean)) +
geom_point()}
else{
g <- ggplot(ddata, aes(x = as.character(gg[1]),
y = mean,
shape = as.character(gg[2]),
color= as.character(gg[2])),
group = as.character(gg[2]))}
form <- unlist(strsplit(groupby, ';', fixed = T))
form <- paste(form, collapse = " + ")
form <- paste(var, " ~ ", form)
form
data%>%
filter(!is.na(var)) %>%
aov(formula = form) -> anova
summary(anova) -> anova
l <- list(ddata, g, anova)
l
}
My problems are:
a. groupby could contain one or two variables. I can´t manage to use groupby as an argument for group_by in the ggplots. Either I get: Error: Discrete value supplied to continuous scale
in case I use: x = gg[1]
, or I use: x = as.factor(gg[1]) or: as.character
and get the following plot (i.e. x is only labeled "BPL", but not grouped by the factor).
b. when I try to use two (instead of one) groupby factors, things get even worse and the plot is completely empty...
c. I manage to create the right formula for the anova, but when I try to actually calculate it I encounter: Error: $ operator is invalid for atomic vectors
-> any ideas why?
d. not critical, but any ideas for using the second, optional group as color & shape in aes() in case the argument contains two groups, without using the if
?
Many many thanks in advance!
Guy