In earlier versions of sklearn's MinMaxScaler one could specify the minimum and maximum values based on which the scaler would normalize the data. In other words, the following was possible:
from sklearn import preprocessing
import numpy as np
x_data = np.array([[66,74,89], [1,44,53], [85,86,33], [30,23,80]])
scaler = preprocessing.MinMaxScaler()
scaler.fit ([-90, 90])
b = scaler.transform(x_data)
This would cause the array above to be scaled to the range of (0,1) with the minimum possible value of -90 becoming 0, the maximum possible value of 90 becoming 1 and with all the values in-between getting scaled accordingly. With version 0.21 of sklearn this throws an error:
ValueError: Expected 2D array, got 1D array instead:
array=[-90. 90.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I turned scaler.fit ([-90, 90])
to scaler.fit ([[-90, 90]])
, but then I got:
ValueError: operands could not be broadcast together with shapes (4,3) (2,) (4,3)
I know for a fact that I can do scaler.fit (x_data)
, but this leads to the following result after tranform:
[0. 0.33333333 0.35714286]
[1. 1. 0. ]
[0.3452381 0. 0.83928571]]
My issue with that is twofold: 1) the numbers do not seem to be correct. They were supposed to be scaled between 0 and 1, but I get many 0s and many 1s for values that should be higher and lower respectively. 2) what if I want to scale every future array to a range of (0,1) based on a fixed range of, say, (-90. 90)? This was a convenient feature, but now I have to use a specific array to do my scaling. What is more, the scaling will produce different results every time because I will have to fit every future array anew, thus receiving variable results.
Am I missing something here? Is there a way to keep this nifty feature? And if there isn't, how will I make sure my data is scaled correcty and consistently every time?