0

I am training a model to detect Good/Bad clients. My input features are:

'Net Receivables', 'Sales', 'Cost of Goods sold', 'Current Assets',
       'Property, plant and equipment', 'Securities', 'Total assets',
       'Depreciation', 'Selling, General & Administrative Expense',
       'Total long term debt', 'Current Liabilites', 'Net Receivables.1',
       'Sales.1', 'Cost of Goods sold.1', 'Current Assets.1',
       'Property, plant and equipment.1', 'Securities.1', 'Total assets.1',
       'Depreciation.1', 'Selling, General & Administrative Expense.1',
       'Total long term debt.1', 'Current Liabilites.1',
       'Income from Continuing Operations', 'Cash Flows from Operations'

I trained a simple model using Logistic Regression:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = LogisticRegression()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Then I try to evaluate the model using AUC and accuracy

print(roc_auc_score(y_test, pred))
print(accuracy_score(y_test, pred))

The result is

0.765625
0.7727272727272727

But when I try to evaluate the feature importance by

odds = np.exp(clf.coef_[0])

I found some strange coefficients. It seems that no features are relatively more significant

array([1.00000001, 1.00000035, 0.99999963, 0.99999987, 0.99999928,
       1.        , 1.        , 0.99999993, 1.00000019, 0.9999994 ,
       0.99999976, 1.00000016, 0.99999996, 1.00000003, 0.99999967,
       0.99999967, 1.        , 1.00000035, 0.99999995, 0.99999985,
       1.00000035, 1.00000021, 1.00000008, 1.00000051])

My training set is relatively small: 174 rows * 24 features.

Can I trust the score of the model?

JOHN
  • 1,411
  • 3
  • 21
  • 41

1 Answers1

0

Why do you use np.exp ?

And why do you do use coef_[0], the normal approach to get the coefficient for your logistic regresion should be:

print(clf.coef_, clf.intercept_)

followed also by this post.

PV8
  • 5,799
  • 7
  • 43
  • 87
  • I have taken reference to this link: https://towardsdatascience.com/interpreting-coefficients-in-linear-and-logistic-regression-6ddf1295f6f1. It’s because coefficients are log odd. – JOHN Nov 17 '20 at 12:47
  • np.exp should make any big difference then, but you should try `clf.coef_` – PV8 Nov 17 '20 at 13:06
  • I think the problem is the unit of financial data is 1 dollar so the coefficient is small. – JOHN Nov 20 '20 at 07:08