0

I would like to run a Random Forest model to determine the most important predictors determining species relative abundance i.e. the predictors that explain the most variation. Df1 is 20 species (Sp1 - Sp20) and their relative abundances, and df2 is 38 predictor variables (Var1 - Var38) and their values. Variables have already been transformed to reduce skewed distributions. What code would I use to run this?

     df1             
     ID  Sp1      Sp2     Sp3     etc.
     1    34      22      34      
     2     3      25      54      
     3    87      68      14     
     4    66      98      98     
     5    55      13      77      


     df2             
     ID  Var1    Var2    Var3    etc.
     1    -0.082   1      290      
     2    -0.094   0      301      
     3    -0.322   1      400     
     4    -0.123   0      555     
     5    -0.457   0      321
AlexP
  • 147
  • 2
  • 9
  • Welcome to SO! Folks will be better equipped to help you if you produce a minimal reproducible example, with some toy data. Here's a guide on how to do so https://stackoverflow.com/help/minimal-reproducible-example – Dij Sep 04 '19 at 20:50
  • Note on *"Variables have already been transformed to reduce skewed distributions."*, since decision trees look for cut points, rather than fitting coefficients, monotonic transformations (that are still invertible), e.g., `log`, `sqrt`, etc., don't really do anything. – Gregor Thomas Sep 04 '19 at 21:11
  • Gregor, can you send me a link to where this question has been answered please? – AlexP Sep 05 '19 at 15:55

0 Answers0