I´m trying to create a function to (visually) compare the distribution of a variable, with that of the same variable after a Box-Cox transformation. The variable is a single column pulled out of my entire data frame.
library(EnvStats)
bc_compare_1 <- function(var){
bc_var <- boxcox(lm(var ~ 1))
lambda <- bc_var$x[which.max(bc_var$y)]
var_T <- (var^lambda - 1)/lambda
g <- ggarrange(
ggdensity(var, fill = "grey", alpha = 0.3) +
geom_histogram(colour = 1, fill = "white",
position = "identity", alpha = 0) +
ggtitle("original") +
theme(plot.title = element_text(size = 11)),
ggdensity(var_T, fill = "grey", alpha = 0.3) +
geom_histogram(colour = 1, fill = "white",
position = "identity", alpha = 0) +
ggtitle("transformed") +
theme(plot.title = element_text(size = 11)))
g <- annotate_figure(g, top = text_grob(substring(deparse(substitute(var)),3), size = 11))
l <- list(g, paste("lambda = ", lambda))
return(l)
}
This unfortunately doesn´t work:
Error in model.frame.default(formula = var ~ 1, drop.unused.levels = TRUE) :
object is not a matrix
I tried several things, but nothing works, and it seems that the problem is somehow with boxcox() not being able to deal with a linear model which was created within the function, cause I get the same error even in the simple example:
library(EnvStats)
testt <- function(var){
boxcox(lm(var ~ 1))
}
edit: trying to include the data argument in the lm() function also didn´t seem to work:
testt <- function(data, var){
data %>%
pull(var) -> dvar
lmvar <- lm(data = data, formula = dvar ~ 1)
boxcox(lmvar)
}
-> also no good:
Error in model.frame.default(formula = (data %>% pull(var)) ~ 1, data = data, :
'data' must be a data.frame, environment, or list
(the data is a dataframe)
Any ideas?
Thanks a lot in advance!
Guy