I have a data-set(breast-cancer detection) with all numerical data and have divided the data-set into X(containing all features) and y(output class).After splitting the data into training and test sets I am facing an issue on applying feature scaling.On applying feature scaling I am getting an Value-Error: could not convert string to float: '?'.Although I have already replaced '?' with -9999 previously.
X=df.iloc[:,:-1].values
y=df.iloc[:,-1].values
#Now splitting data into training and test data.
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)
#Replacing '?' with -9999.
df=df.replace('?',-9999)
from sklearn.preprocessing import LabelEncoder
#Applying label encoding on y.
le = LabelEncoder()
y = le.fit_transform(y)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train[:, 1:] = sc.fit_transform(X_train[:, 1:])
X_test[:, 1:] = sc.transform(X_test[:, 1:])
#After this I am getting value error.So how can I ensure that the '?' are not remaining in the data or is there any categorical encoding to be done?