Questions tagged [variable-selection]
35 questions
0
votes
0 answers
successive projections algorithm for variable selection in R
I have a dataset of 240 NIR spectra, that is 240 observations of 300 highly correlated variables. I have read that the successive projections algorithm (SPA) is a good way to select informative variables in order to perform further analysis such as…

Míriam Muñoz Lapeira
- 50
- 4
0
votes
0 answers
"NIL dereference (read) error in WinBUGS when using spike and slab priors for variable selection"
I am trying to perform group variable selection on a random intercept model using spike and slab priors in WinBUGS. However, I think it is trapping and I keep getting the error message "NIL dereference (read)" when running my code. I'm not sure…

Linda A
- 33
- 3
0
votes
1 answer
Distinguishing between structural & nonstructural regressor candidates in N Lassos run sequentially on N synthetic data sets
In this collaborative research project working towards a second draft of a 2008 Working Paper which proposed a promising straight-forward, yet novel Optimal Variable Selection Algorithm in Supervised Statistical Learning. The novel variable…

Marlen
- 171
- 11
0
votes
1 answer
How to iteratively remove just the intercept terms from the variables selected by n glmnet functions run on n datasets in R
I have run N individual LASSO Regressions on N different data sets using the glmnet() function from the package of the same name in RStudio using the following lines of code:
# This function fits all n LASSO regressions for/on
# each of the…

Marlen
- 171
- 11
0
votes
0 answers
R: What is the difference of the Lasso for variable selection between the packages glmnet and hdm
For my PhD I use a Lasso approach in R for variable selection. Now, I used the package glmnet and also hdm. What is the difference of the basic lasso estimator for logistic regression in these two packages? I read the docs and also googled a lot but…

Irazall
- 117
- 9
0
votes
1 answer
Can I include covariates outside of the minimally sufficient set in a causal framework that aren't in the causal pathway?
I am applying a causal method to a cohort study analysis on pollutant exposure and disease X. Based on our understanding of the disease, we believe that aging is the only confounder.
From what I understand, age would be the item in our minimally…
0
votes
0 answers
Selecting important features to perform random forest classification
I have 9 parameters, I want to select 6 important parameters and discard 3. What is the best method to do it? I have seen some methods of ranking the parameters by recursive feature elimination (e.g. RFECV). Can I use the random forest…

lsr729
- 752
- 2
- 11
- 25
0
votes
2 answers
StepAIC() stopping point
I am trying to understand the stopping point of StepAIC(). When using direction = 'backward', does it stop if any further deletion of the terms no longer decreases model AIC? Example as follows:
fm<- lm(mpg ~ ., data = mtcars)
require(MASS)
fit_fm…

cliu
- 933
- 6
- 13
0
votes
1 answer
R codes in Augmented backward elimination variable selection
I have a data set with 357 patients. there are about 10-15 potential variables to be selected in the final model. some of the variables are highly correlated. so I decided to use augmented backward elimination variable selection…

lingyanmeng
- 13
- 3
0
votes
0 answers
R Explaining Random Forest Variable Selection Sample Code
I have the sample code of random forest variable selection. We want to choose the combination of variables with most importance and build the random forest model with the lowest OOB. Can anyone explain the for loop part in the function for me?
…

yueyue
- 1
0
votes
0 answers
should we include or exclude a variable in a logistic regression based on the description below?
should we include or exclude a variable in a logit regr. model which will only obtain values if a certain event takes place otherwise will show N/A?
this variable tells whether or not a product will be bought based on calls made by the company.
the…

shrippi
- 1
0
votes
0 answers
R - Using xgboost as feature selection but also interaction selection
Let's say I have a dataset with a lot of variables (more than in the reproductible example below) and I want to build a simple and interpretable model, a GLM.
I can use a xgboost model first, and look at importance of variables (which depends on the…

demarsylvain
- 2,103
- 2
- 14
- 33
0
votes
1 answer
Variable Selection in R
I'm setting a model to find the significant variables using variable selection.
str(tweets2)
'data.frame': 6429 obs. of 13 variables:
$ created_at : Factor w/ 5918 levels "1/10/2019 17:40",..: 56
53 52 51 50 49 48 47 46 45 ...
$…

Kasi Perumal
- 11
- 4
0
votes
2 answers
automatization of lm tests with all possible var combinations and getting values for: shapiro.test(), bptest(),vif() in R
I´ve spent days searching for the optimal models which would fulfill all of the standard OLS assumptions (normal distribution, homoscedasticity, no multicollinearity) in R but with 12 variables, it´s impossible to find the optimal var combination.…

Mapos
- 177
- 1
- 9
0
votes
1 answer
Selection column in a dataframe in pandas apply min function
I have n-dataframe in a list
df=[df_1, df_2, df_3, ...., df_n]
Where df_n is a dataframe in pandas (python). df_n is a variable of my keras-model.
X_train=[df_1_1,df_2_1,...,df_n_1]
Where:
df_1_1 is the first dataframe of the list (the first…

Francisco Gonzalez
- 437
- 1
- 3
- 15