We need to predict the Credit Card Approvals using various Machine Learning methods.
import pandas as pd
import numpy as np
ccard = pd.read_csv("Credit_card.csv")
ccard_label = pd.read_csv("Credit_card_label.csv")
cc_merged = pd.merge(ccard, ccard_label, how='outer', on='Ind_ID')
cc_merged = cc_merged.drop('Type_Occupation', axis = 'columns')
X = cc_merged.iloc[:, 1:-1].values
y = cc_merged.iloc[:, -1].values
Taking Care of Missing Data
cc_merged.isnull().sum()
Ind_ID 0
GENDER 7
Car_Owner 0
Propert_Owner 0
CHILDREN 0
Annual_income 23
Type_Income 0
EDUCATION 0
Marital_status 0
Housing_type 0
Birthday_count 22
Employed_days 0
Mobile_phone 0
Work_Phone 0
Phone 0
EMAIL_ID 0
Family_Members 0
label 0
dtype: int64
While checking the null values in the merged dataframe, we found that -
Gender = 7 null values, Annual_Income = 23 null values, Birthday_Count = 22 null values, Type_Occupation = 488 null values.
cc_merged['GENDER'] = cc_merged['GENDER'].fillna(method = 'pad')
cc_merged['Annual_income'] = cc_merged['Annual_income'].fillna(cc_merged['Annual_income'].mean())
cc_merged['Birthday_count'] = cc_merged['Birthday_count'].fillna(method = 'pad')
Encoding the Categorical Data :
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(sparse_output=False), [2,3,6,7,8,9])], remainder='passthrough') X = np.array(ct.fit_transform(X)) X array([[0.0, 1.0, 0.0, ..., 0, 0, 2], [0.0, 1.0, 1.0, ..., 1, 0, 2], [0.0, 1.0, 1.0, ..., 1, 0, 2], ..., [0.0, 1.0, 0.0, ..., 0, 0, 4], [0.0, 1.0, 1.0, ..., 1, 0, 2], [0.0, 1.0, 0.0, ..., 0, 0, 2]], dtype=object) y array([1, 1, 1, ..., 0, 0, 0])
Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)
Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train[:, 5:] = sc.fit_transform(X_train[:, 5:])
X_train[:, 11:] = sc.fit_transform(X_train[:, 11:])
X_test[:, 5:] = sc.transform(X_test[:, 5:])
X_test[:, 11:] = sc.transform(X_test[:, 11:])
Getting Error Code:
ValueError: could not convert string to float: 'F'