I have a dataframe train
(21 predictors, 1 response, 1012 observations), and I suspect that the response is a nonlinear function of the predictors. Thus, I would like to perform a multivariate polynomial regression of the response on all the predictors, and then try to understand which are the most important terms. To avoid the collinearity problems of standard multivariate polynomial regression, I'd like to use multivariate orthogonal polynomials with polym()
. However, I have quite a lot of predictors, and their names do not follow a simple rule. For example, in train
I have predictors named X2
,X3
and X5
, but not X1
and X4
. The response is X14
. Is there a way to write the formula in lm
without having to explicitly write the name of all predictors? Writing
OrthoModel=lm(X14~polym(.,2),data=train)
returns the error
Error in polym(., 2) : object '.' not found
EDIT: the model I wanted to fit contains about 3.5 billion terms, so it's useless. It's better to fit a term with only main effects, interactions and second degree terms -> 231 terms. I wrote the formula for a standard (non-orthogonal) second degree polynomial:
`as.formula(paste(" X14 ~ (", paste0(names(Xtrain), collapse="+"), ")^2", collapse=""))`
where Xtrain
is obtained by train
by deleting the response column X14
. However, when I try to express the polynomial in an orthogonal basis, I get a parse text error:
as.formula(
paste(" X14 ~ (", paste0(names(Xtrain), collapse="+"), ")^2", "+",
paste( "poly(", paste0(names(Xtrain), ", degree=2)",
collapse="+"),
collapse="")
)
)