How can I convert the StandardScaler() transformation back to dataframe?

Question

I'm working with a model, and after splitting into train and test, I want to apply StandardScaler(). However, this transformation converts my data into an array and I want to keep the format I had before. How can I do this?

Basically, I have:

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = df[features]
y = df[["target"]]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.7, random_state=42
)

sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)

How can I get X_train_sc back to the format that X_train had?

Update: I don't want to get X_train_sc to reverse back to before being scaled. I just want X_train_sc to be a dataframe in the easiest possible way.

There should be an `inverse_transform` method for the `Standard_Scaler` that takes you back. — Sia, Oct 01 '20 at 18:45
The inverse_transform change the data back to before being scaled. I don't want that, I just want X_train_sc to be in the same format as X_train — dmmmmd, Oct 01 '20 at 18:52
After applying StandardScaler(), I lose track of the name of the variables. It becomes an array without the column names. I just want a dataframe like it was X_train — dmmmmd, Oct 01 '20 at 18:55
Something to realize is that `X_train` is not scaled yet. You are using `fit_transform` that completes two tasks for the data in one step. You should use `fit` separately to keep track of the variables, then apply `transform` in a different step. — Sia, Oct 01 '20 at 18:57

FBruzzesi · Accepted Answer · 2023-01-09T10:03:31.517

As you mentioned, applying the scaling results in a numpy array, to get a dataframe you can initialize a new one:

import pandas as pd

cols = X_train.columns
sc = StandardScaler()
X_train_sc = pd.DataFrame(sc.fit_transform(X_train), columns=cols)
X_test_sc = pd.DataFrame(sc.transform(X_test), columns=cols)

2022 Update

As of scikit-learn version 1.2.0, it is possible to use the set_output API to configure transformers to output pandas DataFrames (check the doc example)

The above example would simplify as follows:

import pandas as pd

cols = X_train.columns
sc = StandardScaler().set_output(transform="pandas")
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)

How can I convert the StandardScaler() transformation back to dataframe?

1 Answers1