0

I know that there dosens of similar questions/answers, and lots of papers. But please read till the end.

Non-statisticians tend to use stepwise regressions which is strongly argued by statisticians. This is stomething that I don't understand, but I just obey them. "Ok this is not a good way to do your modelling".

Here is (was) my model:

b <- lmer(metric1~a+b+c+d+e+f+g+h+i+j+k+l+(1|X/Y) + (1|Z), data = dataset) drop1 (b, test="Chisq")

(Just a small note: Watch out for the random effects in my model; random effects are Year, Month, Sampling.location; one of my variables is 1/0: I allready log-transformed my variables)

I am trying to find a exploratory model (with drop1 to reach final model) and evaluating it with my biological knowledge to see if the dependent ("metric" in this case) seems to be responding variables. I will repeat this process with 100 metrics just to evaulate which metrics seems to be responding environmental variables.

I was in the search for an acceptable model instead of stepwise according to the suggestions of statistics gurus.

However, there are lots of alternatives. I read alot, but still feel myself lost. Some say Lasso, some say elastic modelling, some say ridge regression... Which one fits for my purpose?

Any advise for a better alternative and an easy model or a help page for dummies, or examples (that could be better) would be much appreciated.

Thanks in advance.

borgs
  • 3
  • 5
  • You would get more help posting this at [stats.stackexchange.com](http://stats.stackexchange.com). One thing I would suggest in the long run is to [read this book](http://www-bcf.usc.edu/~gareth/ISL/index.html). It is approachable, free to use, and shows how to implement these tools in R. – Phil Apr 02 '17 at 15:26
  • although, with caveat, I think stat learning book focuses on prediciton models rather than inference. – user20650 Apr 02 '17 at 15:43
  • @user20650 I am afraid you are right in most cases. I need something spesific rather than general approaches. – borgs Apr 02 '17 at 15:47
  • I'm skeptical of using time as a random effect. Time dependency is often associated with autocorrelation. I admit that I have not fully understood what you are trying to achieve, but you might want to consider if you really should assume linearity for the fixed effects. Possibly you should fit a GAMM. Package mgcv offers penalized regression which would shrink your model. – Roland Apr 02 '17 at 18:11
  • An example. Think about the metrics of chemical composition in soil. Rainfall, land usage, farming (etc) has effects on this composition. I analysed lots of soil samples from different parts of the continent. I gather data. And I want to see if there is a logic (in an ecological manner) trend between compostion and my variables (rainfall, land usage, farming). I'm NOT interested in numbers in formula. I just need trends; I mean.. for example in a logical manner u expect more elements in soil with heavy land usage. So I just care about the variables in model and their trends. – borgs Apr 02 '17 at 18:33
  • Then a GAMM would be perfect for you. If you plot the smoothers you can see your trends and you can even use interaction smoothers. Consider that some of your relationships are expected to follow an optimum curve and not a linear trend. I analyze somewhat similar data using GAM/GAMM. – Roland Apr 03 '17 at 07:37
  • @Roland Thank you very much, I will dig more info on GAM/GAMM. Is it possible to show the significance in GAM/GAMM to find out which models are better and/or to compare models. And secondly, does that GAM thing excludes unsignifixant variables or include all of them?By the way do you have an idea on`glmulti` ? – borgs Apr 03 '17 at 09:07
  • `mgcv::gam` can shrink smoothers to a linear or even constant (thereby removing them) relationship. It uses penalized regression. You get p-values for every parametric coefficient and every smoother term. – Roland Apr 03 '17 at 09:25

0 Answers0