Doing some sentiment analysis, I am trying to get the feature importance using logistic regression. I found a reference here (How to get feature importance in logistic regression using weights?) to how to do it, but when implementing it, it's giving me error and I don't know why and how to solve.
Can some one help me ?
here is my code.
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import StandardScaler
## Creating Training data
Independent_var = df_final.tweet # the features
Dependent_var = df_final.sent_binary # the sentiment (positive, negative, neutral)
# Logistic regression
cv = CountVectorizer(min_df=2, max_df=0.50, ngram_range = (1,2), max_features=50)
text_count_vector = cv.fit_transform(Independent_var)
#standardized_data = StandardScaler(with_mean=False).fit_transform(text_count_vector)
feature_names = np.array(cv.get_feature_names())
#feature_names
## Splitting in the given training data for our training and testing
X_tr, X_test, y_tr, y_test = train_test_split(text_count_vector, Dependent_var, test_size=0.3, random_state=225)
LogReg = LogisticRegression(solver='lbfgs', multi_class='multinomial')
LogReg_clf = LogReg.fit(X_tr, y_tr)
#coefs = np.abs(LogReg_clf.coef_)
coefs = LogReg_clf.coef_
#get the sorting indices
sorted_index = np.argsort(coefs)[::-1]
# check if the sorting indices are correct
print(coefs[sorted_index])
#get the index of the top-20 features
top_20 = sorted_index[:20]
#get the names of the top 20 most important features
print(feature_names[top_20])
The error I get :
IndexError Traceback (most recent call last)
<ipython-input-103-b566f1c5a21c> in <module>
22 print(sorted_index)
23 # check if the sorting indices are correct
---> 24 print(coefs[sorted_index])
25
26 #get the index of the top-20 features
IndexError: index 23 is out of bounds for axis 0 with size 3