Well, doing PCA and calculating Z-Scores may get you there, but there is a MUCH better way to approach this kind of problem. Please consider using Feature Engineering, to identify the features that are most highly related to a set of data (dependent variable) and removing the irrelevant or less important features with do not contribute much to our target variable (in order to achieve better overall accuracy for our model).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("https://rodeo-tutorials.s3.amazonaws.com/data/credit-data-trainingset.csv")
df.head()
from sklearn.ensemble import RandomForestClassifier
features = np.array(['revolving_utilization_of_unsecured_lines',
'age', 'number_of_time30-59_days_past_due_not_worse',
'debt_ratio', 'monthly_income','number_of_open_credit_lines_and_loans',
'number_of_times90_days_late', 'number_real_estate_loans_or_lines',
'number_of_time60-89_days_past_due_not_worse', 'number_of_dependents'])
clf = RandomForestClassifier()
clf.fit(df[features], df['serious_dlqin2yrs'])
# from the calculated importances, order them from most to least important
# and make a barplot so we can visualize what is/isn't important
importances = clf.feature_importances_
sorted_idx = np.argsort(importances)
padding = np.arange(len(features)) + 0.5
plt.barh(padding, importances[sorted_idx], align='center')
plt.yticks(padding, features[sorted_idx])
plt.xlabel("Relative Importance")
plt.title("Variable Importance")
plt.show()
Just make whatever (pretty obvious) changes you need to make, to customize that code to your specific data set.
Here are a couple links that further explain how Feature Engineering works.
https://github.com/WillKoehrsen/feature-selector/blob/master/Feature%20Selector%20Usage.ipynb
https://towardsdatascience.com/feature-selection-techniques-in-machine-learning-with-python-f24e7da3f36e
For your reference, here is a good link for understanding PCA better.
https://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html
Also, here is a great link for understanding Z-Scores better.
Pandas - Compute z-score for all columns