I am doing panel data analysis with 17 variables in R using the package "plm".
I have to eliminate these variables while retaining the most significant of them. I am looking at adjusted R-square for the set of variables that best explain my dependent variable. Since I have 17 variables, repeating and observing again and, again has become cumbersome. Following is my code:
attach(pdf)
pdata <-plm.data(pdf,index=c("country","day"))
Y <- cbind(DEP_var)
var_list <- pdf[c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q")]
between_models= list()
R_Sqrt=c()
for(i in 1:17){
X<-cbind(var_list[,1:i])
between_models[i]=plm(Y~ X, data=pdata, model= "between")
R_Sqrt[i]=coef(between_models[i])["Adj. R-Squared"]
}
print(paste("Least Adj. R-Squared is",which.max(R_Sqrt))
print(between_models[[which.max(R_Sqrt)]]) # print least Adj. R-Squared model
What I am trying to do with the above code is to increase the number of variables in Y
and estimate the between model again and again till the Y
has the maximum number of variables. And then look at the list of adjusted R-square values and pick the summary for the model with the highest adjusted R-square. When I run the above code it gives the following error:
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, : invalid type (list) for variable 'X'
In the above code for loop, it seems that there is a problem in type of the variable X. Please suggest how to fix it so the loop runs properly and give the least adjusted R-square model as the result.