So I'm working on a project that is using RFECV for feature selection and then doing ridge regression with the selected variables.
The way the data set is structured I have a train_y = dependent variable, train_x = everything else in the data frame (variables are all fine in other models).
Here is the code I'm using:
# Variable Selection with RFECV
lr = LinearRegression()
rfecv = RFECV(estimator = lr, step = 1, cv=StratifiedKFold(train_y, 2), scoring='r2')
selector = rfecv.fit(train_x, train_y)
train_X_new = selector.transform(train_x)
train_Y_new = selector.transform(train_y)
param_grid = {'alpha': sp_rand()}
# create and fit a ridge regression model, testing random alpha values
model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
rsearch.fit(train_X_new, train_Y_new)
expected = train_X_new
predicted = model.predict(train_Y_new)
# summarize the fit of the model
mse = np.mean((predicted-expected)**2)
print "MSE and Model Score: "
print(mse)
print(model.score(train_X_new, train_Y_new))
The code errors out on this line:
train_Y_new = selector.transform(train_y)
with "ValueError: X has a different shape than during fitting." No idea what is causing the error.
Any help/insight is appreciated!
Thanks!