I have a dataset that looks like below:
| Amount | Source | y |
| -------- | ------ | - |
| 285 | a | 1 |
| 556 | b | 0 |
| 883 | c | 0 |
| 156 | c | 1 |
| 374 | a | 1 |
| 1520 | d | 0 |
'Source' is the categorical variable. The categories in this field are 'a', 'b', 'c' and 'd'. So the one hot encoded columns are 'source_a', 'source_b', 'source_c' and 'source_d'. I am using this model to predict values for y. The new data for prediction does not contain all categories used in training. It only has categories 'a', 'c' and 'd'. When i one hot encode this dataset, it is missing the column 'source_b'. How do i transform this data to look like training data?
PS: I am using XGBClassifier() for prediction.