I am training a model to detect Good/Bad clients. My input features are:
'Net Receivables', 'Sales', 'Cost of Goods sold', 'Current Assets',
'Property, plant and equipment', 'Securities', 'Total assets',
'Depreciation', 'Selling, General & Administrative Expense',
'Total long term debt', 'Current Liabilites', 'Net Receivables.1',
'Sales.1', 'Cost of Goods sold.1', 'Current Assets.1',
'Property, plant and equipment.1', 'Securities.1', 'Total assets.1',
'Depreciation.1', 'Selling, General & Administrative Expense.1',
'Total long term debt.1', 'Current Liabilites.1',
'Income from Continuing Operations', 'Cash Flows from Operations'
I trained a simple model using Logistic Regression:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = LogisticRegression()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
Then I try to evaluate the model using AUC and accuracy
print(roc_auc_score(y_test, pred))
print(accuracy_score(y_test, pred))
The result is
0.765625
0.7727272727272727
But when I try to evaluate the feature importance by
odds = np.exp(clf.coef_[0])
I found some strange coefficients. It seems that no features are relatively more significant
array([1.00000001, 1.00000035, 0.99999963, 0.99999987, 0.99999928,
1. , 1. , 0.99999993, 1.00000019, 0.9999994 ,
0.99999976, 1.00000016, 0.99999996, 1.00000003, 0.99999967,
0.99999967, 1. , 1.00000035, 0.99999995, 0.99999985,
1.00000035, 1.00000021, 1.00000008, 1.00000051])
My training set is relatively small: 174 rows * 24 features.
Can I trust the score of the model?