0

As explained in similar question, one can easily test the data when you have a set of samples. If the user has to predict the target of a single sample, then how to proceed. Please help.

Thanks.

  • There should be no difference: `scaled_sample = scaler.transform(sample)` – Dan Feb 19 '20 at 13:53
  • scaler=MinMaxScaler() x_train_scaled = scaler.fit_transform(x_train) a=scaler.transform(x_test.iloc[0,:]) When I do scaling for a single sample I am facing following error Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. – praveen kumar yethirajula Feb 19 '20 at 14:01
  • @Dan Thanks! Can you please above case which I tried. My requirement is to test each sample after training the model. Can you please help me in this. – praveen kumar yethirajula Feb 19 '20 at 14:07

1 Answers1

4

You can use the same MinMaxScaler() object you've used while training to transform your single instance. Here's an example.

# training data
X_train = np.array([[1, 2], [3, 4], [5, 6]])
y_train = np.array([1, 0])

# scaler
scaler = MinMaxScaler().fit(X_train)

Scaling X_train:

X_train_scaled = scaler.transform(X_train)

Train the model using X_train_scaled and y_train ...


Predicting on the new sample np.array([7, 8]):

new_sample = np.array([7, 8]).reshape(1, -1)  # because the scaler expects a 2D array
scaler.transform(new_sample)  # pass this to model.predict()

Edit:

How Min-Max Normalization works:

The following transformation is applied to each feature (Wikipedia Link):

enter image description here

We will apply that to X_train

X_train = np.array([[1, 2], [3, 4], [5, 6]])

array([[1, 2],
       [3, 4],
       [5, 6]])

# min, max of each feature
mn = np.min(X_train, axis=0)  # array([1, 2])
mx = np.max(X_train, axis=0)  # array([5, 6])

Calculating the scaled version:

(X_train - mn) / (mx - mn)

array([[0. , 0. ],
   [0.5, 0.5],
   [1. , 1. ]])

The above matches with the result of:

scaler = MinMaxScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)

array([[0. , 0. ],
       [0.5, 0.5],
       [1. , 1. ]])

When you supply a new input vector, the same transformation should apply using the above mn and mx values

new_smaple = np.array([7, 8]).reshape(1, -1)
(new_sample - mn) / (mx - mn)

array([[1.5, 1.5]])

This matches the output of scaler.transform(new_sample)

Also, you can extract min, max from a fitted MinMaxScaler object using scaler.data_min_ and scaler.data_max_ which will match the above mn and mx.

akilat90
  • 5,436
  • 7
  • 28
  • 42
  • As per my knowledge : In the case of reshaping np.array(-1,1) , we are transforming the single sample from row form to column. Then scaler will rescale the sample based on the information present in column form . Doesnt we loosing any information. Can you please help in understanding how the rescaling will work from the scaler features of x_train. – praveen kumar yethirajula Feb 19 '20 at 14:19