0

I have a high-dimensional data frame df with dimensions of 3000 x 80 (a document term matrix). I have a classification function that takes in two arguments: formula and data. For formula, I want it to take all the features (variables) of df automatically. Is there a way to take in a list of all column names to create a formula object?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
juanjedi
  • 140
  • 1
  • 7
  • Formulas can use a `.` to refer to all variables: `~ . ` See [this question](https://stackoverflow.com/q/13446256/4996248) – John Coleman Sep 30 '20 at 23:29
  • True, but wildcards can only be used with implemented functions such as `lm` and others right? What if my function doesn't support this? – juanjedi Sep 30 '20 at 23:31

2 Answers2

4

You could probably do

reformulate(names(df))

which will produce a one-sided formula with all of the variable names. (It's really not much more than syntactic sugar for as.formula(paste(names(df), collapse="+")).)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
-1

For a formula use:

as.formula(paste( "dependent_var ~ ", paste(names(df), collapse="+"))
M--
  • 25,431
  • 8
  • 61
  • 93
Paul
  • 1