0

Doing some sentiment analysis, I am trying to get the feature importance using logistic regression. I found a reference here (How to get feature importance in logistic regression using weights?) to how to do it, but when implementing it, it's giving me error and I don't know why and how to solve.

Can some one help me ?

here is my code.

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import StandardScaler


## Creating Training data
Independent_var = df_final.tweet # the features
Dependent_var = df_final.sent_binary # the sentiment (positive, negative, neutral)

# Logistic regression
cv = CountVectorizer(min_df=2, max_df=0.50, ngram_range = (1,2), max_features=50)
text_count_vector = cv.fit_transform(Independent_var)
#standardized_data = StandardScaler(with_mean=False).fit_transform(text_count_vector)

feature_names = np.array(cv.get_feature_names())
#feature_names

## Splitting in the given training data for our training and testing
X_tr, X_test, y_tr, y_test = train_test_split(text_count_vector, Dependent_var, test_size=0.3, random_state=225)


LogReg = LogisticRegression(solver='lbfgs', multi_class='multinomial')
LogReg_clf = LogReg.fit(X_tr, y_tr)

#coefs = np.abs(LogReg_clf.coef_)
coefs = LogReg_clf.coef_


#get the sorting indices
sorted_index = np.argsort(coefs)[::-1]
# check if the sorting indices are correct
print(coefs[sorted_index])

#get the index of the top-20 features
top_20 = sorted_index[:20]

#get the names of the top 20 most important features
print(feature_names[top_20])

The error I get :

IndexError                                Traceback (most recent call last)
<ipython-input-103-b566f1c5a21c> in <module>
     22 print(sorted_index)
     23 # check if the sorting indices are correct
---> 24 print(coefs[sorted_index])
     25 
     26 #get the index of the top-20 features

IndexError: index 23 is out of bounds for axis 0 with size 3

Maestra
  • 11
  • 4
  • That is some complex code for anyone unfamiliar with those libraries. Could you test different parts of it, especially those parts relating to line 24, and cut it down to a [mre]? – Lyndon Gingerich Apr 14 '21 at 20:22
  • It might also help to add tag relevant libraries in your question. – Lyndon Gingerich Apr 14 '21 at 20:28
  • Rather than using `argsort` to get the indices sorted and then using those indices to index your coefficients, why not just directly sort the coefficients? – G. Anderson Apr 14 '21 at 20:44
  • Yes, I tested different part of it. I run all the parts before it line by line, it's working. But this line 24 is not.. – Maestra Apr 14 '21 at 20:49

0 Answers0