I am feature scaling my data before logistic regression.
Everything works perfect until I attempt to divide the columns by the max_min vector. It seems to have worked in each column but not the age column, but I cant seem to find why.
I have previously split the data for testing and training and below I am attempting to scale the X_train data.
# Working out the min value for each column and subtracting this from each row in the data
X_train_min = np.array(X_train0.min())
X_train0.sub(X_train_min.squeeze(), axis=1)
From the code above I obtain a table where each value has had the minimum value of its column subtracted, which is correct. Output: output
# Working out the max value for each column and the difference between the max and min values
X_train_max = np.array(X_train0.max())
max_min = np.array(X_train0.max()) - np.array(X_train0.min())
print(max_min)
Output:
[ 56 1 3 2 4 3 18174 56 7]
Here is where I face a problem:
# Dividing each row in the data by the difference between the max and min values of its column
X_train0.div(max_min, axis=1)
I have obtained a table where each value has been divided by the vector, apart from the first column 'Age' where the numbers do not correspond to the division. Output: output