0

I have the below code:

var_list = ['a', 'b', 'c', 'd', 'd', 'e', 'f', 'g', 'h', 'i']
    
y_var = 'lp'
for x_var in var_list:
    formula = y_var + ' ~ ' + x_var
    results = smf.ols(formula, data=df).fit()

I would like to standardize the variables in the list and re-configure my loop to use the standardized variables instead.

I don't have any code of my own. I searched a bit and found the following code https://medium.com/@rrfd/standardize-or-normalize-examples-in-python-e3f174b65dfc which does the transformation:

from sklearn import preprocessing
import numpy as np

# Get dataset
df = pd.read_csv("https://storage.googleapis.com/mledudatasets/california_housing_train.csv", sep=",")# Normalize total_bedrooms column

# Create the Scaler object
scaler = preprocessing.StandardScaler()

# Fit your data on the scaler object
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=names)
pppery
  • 3,731
  • 22
  • 33
  • 46
Parvesh
  • 13
  • 6

1 Answers1

1

What about this? (warning: code not tested)

from sklearn import preprocessing
import numpy as np

# function that returns the scaled dataframe
def scale_df(df):

    # Get column names first
    names = df.columns

    # Create the Scaler object
    scaler = preprocessing.StandardScaler()

    # Fit your data on the scaler object
    scaled_df = scaler.fit_transform(df)

    return pd.DataFrame(scaled_df, columns=names)

# suppose you have two df's that need to be scalled
df_1 = pd.read_csv("https://storage.googleapis.com/mledudatasets/california_housing_train.csv", sep=",")
df_2 = pd.read_csv("https://storage.googleapis.com/mledudatasets/california_housing_train.csv", sep=",")

var_list = ['a', 'b', 'c', 'd', 'd', 'e', 'f', 'g', 'h', 'i']
results = []

# loop dataframes
for df in [df_1, df_2]:
    scaled_df = scale_df(df)
    
    # loop variables
    y_var = 'lp'
    for x_var in var_list:
        formula = y_var + ' ~ ' + x_var
        results.append(smf.ols(formula, data=scaled_df).fit())

The results will be appended to the list results.

Carlos Galdino
  • 302
  • 3
  • 14
  • However, I do not exactly understand what you mean by "standardize the variables". – Carlos Galdino Jun 25 '20 at 19:02
  • Hi Carlos and thanks for the comment. Standardize mean re-scale the variable to have mean of zero and standard deviation of 1; which is what I am looking to achieve with the variables in the var_list. I found that this can be done using preprocessiong from sklearn. I am unsure how to loop over the re-scaling for each variables in the variable list – Parvesh Jun 25 '20 at 19:14
  • Can you do it for one variable? How the code would look like without a loop (for only one variable)? – Carlos Galdino Jun 25 '20 at 19:24
  • Hi again Carlos. Added an example code from another website to show what I am looking to do. – Parvesh Jun 25 '20 at 19:58
  • Nice. I'll take a look at that. – Carlos Galdino Jun 25 '20 at 20:15
  • Hi, Parcesh, I edited my answer based on the new information you gave. Still, I'm not sure I grasped what you are trying to do. – Carlos Galdino Jun 25 '20 at 23:04