-1

if i have a list say

l = [[1, 2], [1, 3], [4, 5], [5, 10]]

how can i only normalize the column 2,3,5,10 using sklearn.preprocessing -> StandardScaler

jjramsey
  • 1,131
  • 7
  • 17
alex ale
  • 31
  • 5

2 Answers2

0

You don't have to rely on the functionality of StandardScaler to do this. Rather, you can extract the second column of l as follows:

import numpy as np

# Making an array from list l. dtype needs to be float 
# so that the second column of l_arr can be replaced with
# its scaled counterpart without it being truncated to
# integers.
l_arr = np.array(l, dtype=float)

# Extracting the second column from l_arr
l_arr_2nd_col = l_arr[:,1]

# Converting l_arr_2nd_col into a column vector
l_arr_2nd_col = np.atleast_2d(l_arr_2nd_col).T

Once that's done, you can use StandardScaler as follows:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(l_arr_2nd_col)

l_arr_2nd_col_scaled = scaler.transform(l_arr_2nd_col)

# ravel is needed, because l_arr[:,1] has shape (4,), but
# l_arr_2nd_col_scaled has shape (4,1).
l_arr[:,1] = l_arr_2nd_col_scaled.ravel()

At this point, you can either do this:

# Replaces l entirely, including replacing the integers in its
# first column with their floating-point counterparts.
l = l_arr.tolist()

or this:

# Replace only selected components of l
for l_elem, l1_scaled in zip(l, l_arr[:,1]):
   l_elem[1] = l1_scaled
jjramsey
  • 1,131
  • 7
  • 17
  • ohh could you tell me how i can then replace the second column from this list to this column? – alex ale May 19 '21 at 15:38
  • @alexale I've updated my answer accordingly. I would highly recommend not using lists for this sort of application, though. – jjramsey May 19 '21 at 19:32
-1

Well i figured this out too, does the trick in one line(for anyone who will need it in future):

l[:,1]  = scaler.fit_transform(l[:,1].reshape(-1,1)).ravel()
alex ale
  • 31
  • 5
  • While your one-liner is helpful if `l` is a 2D array, if I were to use your original list `l`, then I'd need to add `l = np.array(l, dtype = float)` before your one-liner (and I'd need the `dtype=float` argument to keep `l` from becoming an array of integers). – jjramsey May 20 '21 at 12:31
  • yup better two lines than 20 – alex ale May 21 '21 at 16:07
  • 1
    When using Python in actual code, usually that's true. In my answer, I broke it down into several steps so that it was at least somewhat clearer what I was doing and why. – jjramsey May 21 '21 at 20:24