if i have a list say
l = [[1, 2], [1, 3], [4, 5], [5, 10]]
how can i only normalize the column 2,3,5,10 using sklearn.preprocessing -> StandardScaler
if i have a list say
l = [[1, 2], [1, 3], [4, 5], [5, 10]]
how can i only normalize the column 2,3,5,10 using sklearn.preprocessing -> StandardScaler
You don't have to rely on the functionality of StandardScaler
to do this. Rather, you can extract the second column of l
as follows:
import numpy as np
# Making an array from list l. dtype needs to be float
# so that the second column of l_arr can be replaced with
# its scaled counterpart without it being truncated to
# integers.
l_arr = np.array(l, dtype=float)
# Extracting the second column from l_arr
l_arr_2nd_col = l_arr[:,1]
# Converting l_arr_2nd_col into a column vector
l_arr_2nd_col = np.atleast_2d(l_arr_2nd_col).T
Once that's done, you can use StandardScaler
as follows:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(l_arr_2nd_col)
l_arr_2nd_col_scaled = scaler.transform(l_arr_2nd_col)
# ravel is needed, because l_arr[:,1] has shape (4,), but
# l_arr_2nd_col_scaled has shape (4,1).
l_arr[:,1] = l_arr_2nd_col_scaled.ravel()
At this point, you can either do this:
# Replaces l entirely, including replacing the integers in its
# first column with their floating-point counterparts.
l = l_arr.tolist()
or this:
# Replace only selected components of l
for l_elem, l1_scaled in zip(l, l_arr[:,1]):
l_elem[1] = l1_scaled
Well i figured this out too, does the trick in one line(for anyone who will need it in future):
l[:,1] = scaler.fit_transform(l[:,1].reshape(-1,1)).ravel()