1

Given a data.frame, I would like to (dynamically) create a formula y ~ ., where y is the name of the first column of the data.frame.

What complicates this beyond the approach of as.formula(paste(names(df)[1], "~ .")) is that the name of the column might be a function, e.g.:

names(model.frame(lm(I(Sepal.Length/Sepal.Width) ~ Species, data = iris)))[1] is "I(Sepal.Length/Sepal.Width)"

So I need the column name to be quoted, i.e. in the above example I would want the formula to be `I(Sepal.Length/Sepal.Width)` ~ ..

This works:

df <- model.frame(lm(I(Sepal.Length/Sepal.Width) ~ Species, data = iris))
fm <- . ~ .
fm[[2]] <- as.name(names(df)[1])

But is there a neat way to do it in one step?

Mark
  • 200
  • 6
  • Is it not possible to create a new column, `yvar` that is a copy of the first column? Then you can call `yvar` in the `paste` solution... – Lil' Pete Sep 20 '21 at 23:11
  • That would be a problem if there was already a column `df$yvar`. (I could check that the new column is uniquely named, but I'd rather avoid potentially complicated steps like that) – Mark Sep 20 '21 at 23:21
  • `I(...)` doesn't need to be quoted: `lm(formula(model.frame(lm(I(Sepal.Length/Sepal.Width) ~ Species, data = iris))), iris)` and running `formula` on a `model.frame` or a `data.frame` gives the formula you are looking for – rawr Sep 20 '21 at 23:38
  • The idea is to `simulate` new outcome data, put it in the first column of the model frame and refit the model. The simulated data is just one column, so with the above approach it would look for `Sepal.Length` and `Sepal.Width` instead of the new data in column `I(Sepal.Length/Sepal.Width)` – Mark Sep 20 '21 at 23:47
  • are you using the same predictors? you can create the design matrix and use `lm.fit` directly: `mf <- model.matrix(~ 1 + ., data = mtcars[-1]); replicate(3, lm.fit(mf, rnorm(32)), simplify = FALSE)` then you don't have to work with formulas and feed the new response directly into the function – rawr Sep 21 '21 at 00:07
  • I am using the same predictors, but this is intended to be a general function that can be used with ~any model fit which has a `simulate` function (`lm`, `glm`, `lme4::lmer`, ...). So relying on `lm.fit` specifically won't work. My plan was to use `update(mdl, first.column ~ ., data = mf)` to re-run the new model on the modified `mf` (where the `first.column` --- which might be, e.g., a matrix in the case of a binomial model --- has been replaced by the simulated data), hence the question. I'll ask separate questions related to other aspects of the implementation. – Mark Sep 21 '21 at 00:24

1 Answers1

1

We could use reformulate

reformulate(".", response = sprintf("`%s`", names(df)[1]))
akrun
  • 874,273
  • 37
  • 540
  • 662