I have a dataset of 324 rows and 35 columns. I split it into training and testing data:
X_train, X_test, y_train, y_test = train_test_split(tempCSV[feaure_names[0:34]], tempCSV[feaure_names[34]], test_size=0.2, random_state=32)
This seems to work fine, and my X_train and X_test both have 34 features. I apply some further transformations with DictVectorizer because I have categorical variables.
from sklearn.feature_extraction import DictVectorizer
vecS=DictVectorizer(sparse=False)
X_train=vecS.fit_transform(X_train.to_dict(orient='record'))
X_test=vecS.fit_transform(X_test.to_dict(orient='record'))
Now when I compare X_train to X_test, the former has 46 features, and the latter only has 44. What are some possible reasons this could happen?