I am trying to create a function, formulator
, to create R formulas out of a dataframe of responses, coefficients and constants and function names. My intent is to use it when converting large sheets of historical functions into useable R code. It is tedious and error-prone to rewrite each function as (response ~ constant + b1 x x1 + b2 x x2.....)
Example dataframe with same variables, but where not every variable was interesting (e.g. NA when unused) for every case. Every function has its' own row and every part its' own column, where the column name is the variable, and the cell is the coefficient. Not all coefficients are positive.
structure(list(species = c("Pine", "Spruce", "Birch", "Aspen",
"Beech", "Oak", "Noble", "Trivial"), constant = c(-1.6952, -2.2827,
-0.2269, -0.8198, 0.2081, 0.2348, 0.485, 1.9814), lndp1 = c(1.1617,
1.4354, 1.1891, 1.4839, 1.7491, 1.2141, 1.0318, 0.8401), d = c(-0.0354,
-0.0389, -0.0435, -0.024, -0.2167, NA, NA, NA), d2gt = c(0.2791,
0.3106, 0.562, NA, NA, NA, NA, NA)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
My idea was that since it is in a tidy order, I could write a function to do this for me, and reply with a printout like follows:
data %>% formulator(name_column=species, intercept_column=constant, response="Unknown")
In this case, there is no known response variable column, but I might know that all rows in this dataframe have the same response, which could be useful to type in by hand in quotations (tidyeval issue?).
Pine
Unknown ~ -1.6952 + 1.1617 x lndp1 + -0.0354 x d ....
Spruce
Unknown ~ ...
Here's my thinking so far:
formulator <- function(data, name_column, intercept_column){
data1 <- data %>% select(-c(name_column, intercept_column))
function_name <- data[,paste0(name_column)]
intercepts <- data[,paste0(intercept_column)]
varlist <- list()
for(i in 1:dim(data1)[1]){
data2 <- data1 %>% filter(name_column == paste0(function_name$i)) %>% select_if(~!any(is.na(.)))
datadim <- dim(data2)[2]
for(coefs in 1:datadim){
varlist[paste0(function_name$i)][coefs] <- paste0(data2[1,coefs])
}
}
}
This code is incomplete, but I think will manage to handle the varying lengths of each function to print, but I'm unsure of how to tie all this together.