It's always better to give a fully working example in your question. This can and should be minimal. As @anastasiya-Romanova pointed out, you have to follow the right init methods for the pipeline, which is also shown here.
from sklearn.datasets import make_blobs
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import pandas as pd
# Generate synthetic data + make a pseudo-categorical column with qcut
X, y = make_blobs(n_samples=1000, centers=2, random_state=42)
X = pd.DataFrame(X)
X.columns = ["feat1", "feat2"]
X["feat2"] = pd.qcut(X["feat2"], 3, labels=False, duplicates="drop")
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create the pipeline
pipeline = Pipeline([
('preprocessor', ColumnTransformer([('scaler', StandardScaler(), ["feat1"]),
('onehot', OneHotEncoder(handle_unknown='ignore'), ['feat2'])
])),
('classifier', GaussianNB())
])
# Fit the pipeline to the training data
pipeline.fit(X_train, y_train)
# Evaluate the model on the test data
accuracy = pipeline.score(X_test, y_test)
print('Test accuracy:', accuracy)
# show what the preprocessor is doing
X_transformed = pd.DataFrame(pipeline.named_steps['preprocessor'].transform(X))
print(X_transformed.head())
This prints:
0 1 2 3
0 -0.757494 0.0 1.0 0.0
1 1.396373 1.0 0.0 0.0
2 0.648693 1.0 0.0 0.0
3 1.085098 1.0 0.0 0.0
4 0.895531 0.0 1.0 0.0
Test accuracy: 1.0
For completeness, the linked documentation from sklearn demonstrates how to use the pipeline in such a way:
>>> from sklearn.svm import SVC
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.pipeline import Pipeline
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... random_state=0)
>>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
>>> # The pipeline can be used as any other estimator
>>> # and avoids leaking the test set into the train set
>>> pipe.fit(X_train, y_train)
Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])
>>> pipe.score(X_test, y_test)
0.88