I'm trying to convert categorical value (in my case it is country column) into encoded value using LabelEncoder and then with OneHotEncoder and was able to convert the categorical value. But i'm getting warning like OneHotEncoder 'categorical_features' keyword is deprecated "use the ColumnTransformer instead." So how i can use ColumnTransformer to achieve same result ?
Below is my input data set and the code which i tried
Input Data set
Country Age Salary
France 44 72000
Spain 27 48000
Germany 30 54000
Spain 38 61000
Germany 40 67000
France 35 58000
Spain 26 52000
France 48 79000
Germany 50 83000
France 37 67000
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#X is my dataset variable name
label_encoder = LabelEncoder()
x.iloc[:,0] = label_encoder.fit_transform(x.iloc[:,0]) #LabelEncoder is used to encode the country value
hot_encoder = OneHotEncoder(categorical_features = [0])
x = hot_encoder.fit_transform(x).toarray()
And the output i'm getting as, How can i get the same output with column transformer
0(fran) 1(ger) 2(spain) 3(age) 4(salary)
1 0 0 44 72000
0 0 1 27 48000
0 1 0 30 54000
0 0 1 38 61000
0 1 0 40 67000
1 0 0 35 58000
0 0 1 36 52000
1 0 0 48 79000
0 1 0 50 83000
1 0 0 37 67000
i tried following code
from sklearn.compose import ColumnTransformer, make_column_transformer
preprocess = make_column_transformer(
( [0], OneHotEncoder())
)
x = preprocess.fit_transform(x).toarray()
i was able to encode country column with the above code, but missing age and salary column from x varible after transforming