How to implement the Feature Scaling Technique with StandardScalar to multiple columns in machine learning?

Question

We need to predict the Credit Card Approvals using various Machine Learning methods.

import pandas as pd
import numpy as np

ccard = pd.read_csv("Credit_card.csv")
ccard_label = pd.read_csv("Credit_card_label.csv")

cc_merged = pd.merge(ccard, ccard_label, how='outer', on='Ind_ID')

cc_merged = cc_merged.drop('Type_Occupation', axis = 'columns')

X = cc_merged.iloc[:, 1:-1].values
y = cc_merged.iloc[:, -1].values

Taking Care of Missing Data

cc_merged.isnull().sum()

Ind_ID             0
GENDER             7
Car_Owner          0
Propert_Owner      0
CHILDREN           0
Annual_income     23
Type_Income        0
EDUCATION          0
Marital_status     0
Housing_type       0
Birthday_count    22
Employed_days      0
Mobile_phone       0
Work_Phone         0
Phone              0
EMAIL_ID           0
Family_Members     0
label              0
dtype: int64

While checking the null values in the merged dataframe, we found that -

Gender = 7 null values, Annual_Income = 23 null values, Birthday_Count = 22 null values, Type_Occupation = 488 null values.

cc_merged['GENDER'] = cc_merged['GENDER'].fillna(method = 'pad')
cc_merged['Annual_income'] = cc_merged['Annual_income'].fillna(cc_merged['Annual_income'].mean())
cc_merged['Birthday_count'] = cc_merged['Birthday_count'].fillna(method = 'pad')

Encoding the Categorical Data :

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(sparse_output=False), [2,3,6,7,8,9])], remainder='passthrough')

X = np.array(ct.fit_transform(X))

X
array([[0.0, 1.0, 0.0, ..., 0, 0, 2],
       [0.0, 1.0, 1.0, ..., 1, 0, 2],
       [0.0, 1.0, 1.0, ..., 1, 0, 2],
       ...,
       [0.0, 1.0, 0.0, ..., 0, 0, 4],
       [0.0, 1.0, 1.0, ..., 1, 0, 2],
       [0.0, 1.0, 0.0, ..., 0, 0, 2]], dtype=object)

y
array([1, 1, 1, ..., 0, 0, 0])

Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)

Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train[:, 5:] = sc.fit_transform(X_train[:, 5:])
X_train[:, 11:] = sc.fit_transform(X_train[:, 11:])

X_test[:, 5:] = sc.transform(X_test[:, 5:])
X_test[:, 11:] = sc.transform(X_test[:, 11:])

Getting Error Code:

ValueError: could not convert string to float: 'F'

You didn't get rid of all of the strings. There's still the columns Gender, Car Owner, and Type_Income. You can check this yourself by selectively converting array columns to string and seeing which ones fail. — Nick ODell, Sep 01 '23 at 17:29
PS: I would suggest using Pandas DataFrames, rather than NumPy arrays within this pipeline. The sklearn docs have a good example here: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html — Nick ODell, Sep 01 '23 at 17:31
Dear @NickODell can you provide the code so that I can get what is wrong with this... — Yogesh Navandhar, Sep 02 '23 at 02:52

How to implement the Feature Scaling Technique with StandardScalar to multiple columns in machine learning?

0 Answers0