I am developing a model in which it predicts whether the employee retains its job or leave the company.
The features are as below
- satisfaction_level
- last_evaluation
- number_projects
- average_monthly_hours
- time_spend_company
- work_accident
- promotion_last_5years
- Department
- salary
- left (boolean)
During feature analysis, I came up with the two approaches and in both of them, I got different results for the features. as shown in the image here
When I plot a heatmap it can be seen that satisfaction_level has a negative correlation with left.
On the other hand, if I just use pandas for analysis I got results something like this
In the above image, it can be seen that satisfaction_level is quite important in the analysis since employees with higher satisfaction_level retain the job.
While in the case of time_spend_company the heatmap shows it is important while on the other hand, the difference is not quite important in the second image.
Now I am confused about whether to take this as one of my features or not and which approach should I choose in order to choose features.
Some please help me with this.
BTW I am doing ML in scikit-learn and the data is taken from here.