-1

We need to predict the Credit Card Approvals using various Machine Learning methods.

Credit_card_label Credit_card

import pandas as pd
import numpy as np
ccard = pd.read_csv("Credit_card.csv")
ccard_label = pd.read_csv("Credit_card_label.csv")

cc_merged = pd.merge(ccard, ccard_label, how='outer', on='Ind_ID')
cc_merged = cc_merged.drop('Type_Occupation', axis = 'columns')

X = cc_merged.iloc[:, 1:-1].values
y = cc_merged.iloc[:, -1].values

Taking Care of Missing Data

cc_merged.isnull().sum()

Ind_ID             0
GENDER             7
Car_Owner          0
Propert_Owner      0
CHILDREN           0
Annual_income     23
Type_Income        0
EDUCATION          0
Marital_status     0
Housing_type       0
Birthday_count    22
Employed_days      0
Mobile_phone       0
Work_Phone         0
Phone              0
EMAIL_ID           0
Family_Members     0
label              0
dtype: int64

While checking the null values in the merged dataframe, we found that -

Gender = 7 null values, Annual_Income = 23 null values, Birthday_Count = 22 null values, Type_Occupation = 488 null values.

cc_merged['GENDER'] = cc_merged['GENDER'].fillna(method = 'pad')
cc_merged['Annual_income'] = cc_merged['Annual_income'].fillna(cc_merged['Annual_income'].mean())
cc_merged['Birthday_count'] = cc_merged['Birthday_count'].fillna(method = 'pad')

Encoding the Categorical Data :

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(sparse_output=False), [2,3,6,7,8,9])], remainder='passthrough')

X = np.array(ct.fit_transform(X))

X
array([[0.0, 1.0, 0.0, ..., 0, 0, 2],
       [0.0, 1.0, 1.0, ..., 1, 0, 2],
       [0.0, 1.0, 1.0, ..., 1, 0, 2],
       ...,
       [0.0, 1.0, 0.0, ..., 0, 0, 4],
       [0.0, 1.0, 1.0, ..., 1, 0, 2],
       [0.0, 1.0, 0.0, ..., 0, 0, 2]], dtype=object)

y
array([1, 1, 1, ..., 0, 0, 0])

Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)

Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train[:, 5:] = sc.fit_transform(X_train[:, 5:])
X_train[:, 11:] = sc.fit_transform(X_train[:, 11:])

X_test[:, 5:] = sc.transform(X_test[:, 5:])
X_test[:, 11:] = sc.transform(X_test[:, 11:])

Getting Error Code:

ValueError: could not convert string to float: 'F'
  • You didn't get rid of all of the strings. There's still the columns Gender, Car Owner, and Type_Income. You can check this yourself by selectively converting array columns to string and seeing which ones fail. – Nick ODell Sep 01 '23 at 17:29
  • PS: I would suggest using Pandas DataFrames, rather than NumPy arrays within this pipeline. The sklearn docs have a good example here: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html – Nick ODell Sep 01 '23 at 17:31
  • Please also provide Credit_card_label.csv – Stu Sztukowski Sep 01 '23 at 20:53
  • Dear @NickODell can you provide the code so that I can get what is wrong with this... – Yogesh Navandhar Sep 02 '23 at 02:52

0 Answers0