My variable consists of multiple ingredients. Each consists of different ingredients separated by a comma. I used One Hot Encoding for multiple values(MultiLabelBinarizer()), but it increased my dimension of the dataset.
Do we have some appropriate method for this situation?
My variable looks like this:
df['ingredients_str'].head()
0 romaine lettuce, black olives, grape tomatoes
1 plain flour,ground pepper,salt,tomatoes
2 eggs,pepper,salt,mayonaise,cooking oil
3 water,vegetable oil,wheat,salt
4 black pepper,shallots,cornflour,cayenne
Name: ingredients_str, dtype: object