Standardize variables in Python

Question

I have the below code:

var_list = ['a', 'b', 'c', 'd', 'd', 'e', 'f', 'g', 'h', 'i']
    
y_var = 'lp'
for x_var in var_list:
    formula = y_var + ' ~ ' + x_var
    results = smf.ols(formula, data=df).fit()

I would like to standardize the variables in the list and re-configure my loop to use the standardized variables instead.

I don't have any code of my own. I searched a bit and found the following code https://medium.com/@rrfd/standardize-or-normalize-examples-in-python-e3f174b65dfc which does the transformation:

from sklearn import preprocessing
import numpy as np

# Get dataset
df = pd.read_csv("https://storage.googleapis.com/mledudatasets/california_housing_train.csv", sep=",")# Normalize total_bedrooms column

# Create the Scaler object
scaler = preprocessing.StandardScaler()

# Fit your data on the scaler object
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=names)

What library are you using? – Brian61354270 Jun 25 '20 at 18:58 — Brian61354270, Jun 25 '20 at 18:58
sklearn - the standardization is done using StandardScaler – Parvesh Jun 25 '20 at 19:55 — Parvesh, Jun 25 '20 at 19:55

Carlos Galdino · Answer 1 · 2020-06-25T23:01:37.563

1

What about this? (warning: code not tested)

from sklearn import preprocessing
import numpy as np

# function that returns the scaled dataframe
def scale_df(df):

    # Get column names first
    names = df.columns

    # Create the Scaler object
    scaler = preprocessing.StandardScaler()

    # Fit your data on the scaler object
    scaled_df = scaler.fit_transform(df)

    return pd.DataFrame(scaled_df, columns=names)

# suppose you have two df's that need to be scalled
df_1 = pd.read_csv("https://storage.googleapis.com/mledudatasets/california_housing_train.csv", sep=",")
df_2 = pd.read_csv("https://storage.googleapis.com/mledudatasets/california_housing_train.csv", sep=",")

var_list = ['a', 'b', 'c', 'd', 'd', 'e', 'f', 'g', 'h', 'i']
results = []

# loop dataframes
for df in [df_1, df_2]:
    scaled_df = scale_df(df)
    
    # loop variables
    y_var = 'lp'
    for x_var in var_list:
        formula = y_var + ' ~ ' + x_var
        results.append(smf.ols(formula, data=scaled_df).fit())

The results will be appended to the list results.

edited Jun 25 '20 at 23:01

answered Jun 25 '20 at 18:58

Carlos Galdino

302
3
14

However, I do not exactly understand what you mean by "standardize the variables". – Carlos Galdino Jun 25 '20 at 19:02
Hi Carlos and thanks for the comment. Standardize mean re-scale the variable to have mean of zero and standard deviation of 1; which is what I am looking to achieve with the variables in the var_list. I found that this can be done using preprocessiong from sklearn. I am unsure how to loop over the re-scaling for each variables in the variable list – Parvesh Jun 25 '20 at 19:14
Can you do it for one variable? How the code would look like without a loop (for only one variable)? – Carlos Galdino Jun 25 '20 at 19:24
Hi again Carlos. Added an example code from another website to show what I am looking to do. – Parvesh Jun 25 '20 at 19:58
Nice. I'll take a look at that. – Carlos Galdino Jun 25 '20 at 20:15
Hi, Parcesh, I edited my answer based on the new information you gave. Still, I'm not sure I grasped what you are trying to do. – Carlos Galdino Jun 25 '20 at 23:04

Standardize variables in Python

1 Answers1