When there are categorical variables in the formula, then patsy needs the full original dataset to rebuild the category levels and encoding.
After data is transformed to a design matrix, is there a way to retrieve patsy's levels and encoding for that data? I would like to avoid keeping the full dataset around just so that patsy can rebuild the category levels and encoding.
The context is that I'm transforming training data to a design matrix with patsy during model training, and then would like to know the level/encoding to get a model prediction without having to keep the original training data around.