Here is basic code for training a model in TPOT:
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25, random_state=42)
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
In the end, it scores the data on the test set without explicitly doing the transformations that were done on the training set. A few questions here.
- Does the "tpot" model object automatically apply any scaling or other transformations when .score or .predict is called on new out-of-sample data?
- If not, what's the proper way of performing transformations on the test set before calling .score .predict on it.
Please educate if I'm completely misunderstanding this, please. Thank you.