Creating a Decision Tree and the dataset has 21 columns, a mix of numeric and categorical variables. Using sklearn, I understand it does not support categorical variables. I converted categorical to numeric using Label Encoding while also separating the numeric variables. I would then think I'd have to add both groups together so I can split into testing and training data. However when I tried to add the two together (originally numeric variables with the categorical variables converted to numeric) I received a ValueError.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
credit = pd.read_csv('german_credit_risk.csv')
credit.head(10)
credit.info()
credit.describe(include='all')
col_names = ['Duration', 'Credit.Amount', 'Disposable.Income', 'Present.Residence', 'Age', 'Existing.Credits', 'Number.Liable', 'Cost.Matrix']
obj_cols = list(credit.select_dtypes(include='O').columns)
obj_cols
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
encoded_obj_df = pd.DataFrame(columns=obj_cols)
for col in obj_cols:
encoded_obj_df[col] = le.fit_transform(credit[col])
encoded_obj_df.head(10)
credit.columns = col_names + encoded_obj_df
Do I have the right idea and I'm just not adding the two together properly?