I am working on a project using R to select a best fitted model.
I have 15 variables and the sample size is 790,000. Linear model does not work b/c the residuals are not random and non-normal.
So I tried to run nonlinear model with higher polynomial and interaction. However, R is extremely slow and shuts down from time to time due to the large dataset.
I tried using the stepwise function, polym function, but neither were ideal. Is there a function/package for high order polynomial and interaction? If I were to write a loop, how would I check normality and randomness of residuals for each scenario without looking at the plot? (Sharpe test doesn't work b/c large sample size). Thank you so much!
Update: fit2b <- lm(f$Assets ~ polym(f$C,f$Suc,f$SP,f$SS, f$Qual_P, f$A, f$TotalAA, f$Eq,f$D, f$PE, f$EI, f$GE, f$EO, degree = 5, raw=TRUE) + f$Gender + f$LT)
fit1b = lm(f$Assets ~ f$A)
step(fit1b, scope = list( upper=fit2b, lower=~1 ), direction = "forward", trace=FALSE)
Also, I am wondering if there's any other tools to detect multicollinearity besides vif and how should I adjust the model to address it.