1

My machine learning model dataset is cleaveland data base with 300 rows and 14 attributes--predicting whether a person has heart disease or not.. But aim is create a classification model on logistic regression... I preprocessed the data and ran the model with x_train,Y_train,X_test,Y_test.. and received avg of 82 % accuracy...

So to improve the accuracy I did remove features that are highly correlated to each other [as they would give the same inforamtion]

And I did RFE[recursive feature elimination]

followed by PCA[principle component analysis] for dimensionality reduction...

Still I didnt find the dataset to be be better in accuracy..

Why is that?

Also why does my model shows different accuracy each time? is it beacuse of taking different x_train,Y_train,X_test,Y_test each time?

Should i change my model for better accuracy? Is 80 % average good or bad accuracy?

Antony Joy
  • 301
  • 3
  • 15

3 Answers3

3

Try Exhausting grid search or Randomized parameter optimization to tune your hyper parameters.

See: Documentation for hyperparameter tuning with sklearn

ilpianoforte
  • 1,180
  • 1
  • 10
  • 28
1

Should i change my model for better accuracy?

At least you could try to. The selection of the right model is highly dependend on the concrete use case. Trying out other approaches is never a bad idea :)

Another idea would be to get the two features with the highest variance via PCA. Then you could plot this in 2D space to get a better feeling if your data is linearily separable.

Also why does my model shows different accuracy each time?

I am assuming you are using the train_test_split method of scikit-learn so split your data? By default, this method shuffels your data randomized. Your could set the random_state parameter to a fixed value to obtain reproducable results.

  • Hii buddy!! is there a way to improve accuracy with logistic decision? I tried plotting with PCA, but seems like it's a bit difficult in linearly seperating that... – Antony Joy Mar 04 '21 at 10:56
  • 1
    Well, as soon as I know, logistic regression is a linear classifier, so it works the best with linear seperable features (obviously :) ) In order to resolve this issue you could try to stick to this post: https://stackoverflow.com/questions/55937244/how-to-implement-polynomial-logistic-regression-in-scikit-learn Since I don't know if this helps, I would appreciate if you let me know if you could improve your accuracy. – Hackerman443 Mar 04 '21 at 11:10
  • Thanks ... the above post does not help me in solving my issue with logistic regression...but it gave me a better idea on using non linear regression..I ppreciate it. Thanks :) – Antony Joy Mar 05 '21 at 04:01
1

see (https://github.com/dnishimoto/python-deep-learning/blob/master/Credit%20Card%20Defaults%20-%20hyperparameter.ipynb) to improve accuracy you do hypertuning and dimension reduction and scaling. hypertuning is finding best parameters. whereas dimension reduction is removing features that don't contribute to accuracy reducing noise. scaling or normalizing reduce noise in the distribution.

look at GridSearch for find best parameters

Golden Lion
  • 3,840
  • 2
  • 26
  • 35