I made a method in python to compute a triangular mutual information matrix.
def get_mutual_information_matrix(X_train):
p = len(X_train.columns)
MI_matrix = np.zeros((p,p))
for i in range(p):
for j in range(p):
# triangular matrix
if i < j:
continue
elif i == j:
MI_matrix[i,j] = 1
else:
MI_matrix[i,j] = mutual_info_regression(X_train.iloc[:,i].to_frame(), X_train.iloc[:,j], discrete_features=[False])[0]
return MI_matrix
I would like to use this matrix to drop redundant features. For each feature that has a mutual information with other features above a certain treshold, I would like to remove the one that has the less mutual information with the target.
Does that make sense ? How could I do that ?